Xaprb

Stay curious!

How MySQL replication got out of sync

with 8 comments

I created MySQL Table Checksum because I was certain replication slaves were slowly drifting out of sync with their masters, and there was no way to prove it. Once I could prove it, I was able to show that replication gets out of sync for lots of people, lots of times. (If you really want to hear war stories, you should probably talk to one of the MySQL support staff or consulting team members; I’m sure they see this a lot more than I do).

I finally figured out what was causing one of my most persistent and annoying out-of-sync scenarios. It turns out to be nothing earth-shaking; it’s just an easy-to-overlook limitation of statement-based replication. You could call it a bug, but as far as I can see, there’s no way to fix it with statement-based replication. (I’d love to be proven wrong). Read on for the details.

The setup

Here’s the table I saw getting out of sync, usually within hours of being synced:

CREATE TABLE `workpriority` (
  `client` smallint(5) unsigned NOT NULL,
  `workunit` bigint(20) NOT NULL,
  `priority` float NOT NULL,
  `processor` int unsigned NOT NULL default '0',
  PRIMARY KEY  (`client`,`workunit`),
  KEY `priority` (`priority`),
  KEY `processor` (`processor`,`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This table is essentially a queue of work that needs to be done, along with the priority of each item (workunit refers to another table, where the application retrieves the work to do). Applications work against this table in parallel. They run an UPDATE statement to claim some work for themselves, then fetch the rows the statement affected. To avoid any race conditions, the “token” is the result of the CONNECTION_ID() function. Here’s the statement:

update workpriority as p
   inner join (
      select client, workunit
      from workpriority
      where processor = 0
      order by priority desc
      limit 10
   ) as chosen using(client, workunit)
   set p.processor = $cxn_id;

The nested SELECT statement finds 10 unclaimed (processor = 0) rows in order of descending priority. The outer UPDATE statement claims these rows by setting processor to CONNECTION_ID().

Now the application can find out what work it claimed with a simple SELECT with the token in the WHERE clause. Later, after the application processes each row, it issues the following statement to clean out the table:

delete from workpriority where client = ? and workunit = ? and processor = ?;

The problem

The problem seemed to be that some binary log events were not getting replayed on the slave. This table accumulated extra rows on the slaves, as though the DELETE statements weren’t getting to the slaves. To test this, I compared the logs and determined that it’s not a logging issue; the binary log events are getting to the slave and replaying just fine. I can see them in the slave’s relay log and in the slave’s binary log (I have log_slave_updates enabled).

So if that’s not the problem, what is?

The bug

I already showed you the bug. If you didn’t see it, well, neither did I — for a year.

If you still don’t see it, here’s a hint: the slaves get out of sync in totally different ways. In other words, the slaves don’t even match each other after a little while.

The problem is that ORDER BY... LIMIT is non-deterministic. If several rows are tied for priority, the slaves might (and do!) order them differently than the master did. Then the the UPDATE statement claims different rows on the slaves. Some rows that have been claimed on the master are still marked as 0 on the slave. Then they don’t get deleted by the DELETE statement. I was able to confirm this by running a script that does a checksum on this table every few minutes, then as soon as it finds differences dumps the whole table on both the master and the slave. I was able to find some rows that the application hadn’t deleted yet. Sure enough, some of them weren’t claimed on the slave.

The fix

Very simple: resolve the ties. The query now causes a filesort because it can’t use the index to sort, but it’s not that big a table and this query doesn’t run that often. Here’s the fixed query:

update workpriority as p
   inner join (
      select client, workunit
      from workpriority
      where processor = 0
      order by priority desc, workunit
      limit 10
   ) as chosen using(client, workunit)
   set p.processor = $cxn_id;

This limitation of statement-based replication is so basic and simple, and I’ve known about it for a long time, but it’s so innocuously hidden in plain sight that it took me forever to see it.

Written by Xaprb

November 8th, 2007 at 7:30 pm

8 Responses to 'How MySQL replication got out of sync'

Subscribe to comments with RSS

  1. I got another out-of-sync problem when there was a DEADLOCK on the master and when replicating to slave, the statement still worked fine. And ‘cos of different error on master and slave, the replication stopped!!!

    The error is something like below:
    [ERROR] Slave: Query caused different errors on master and slave. Error on master: ‘Deadlock found when trying to get lock; try restarting transaction’ (1213), Error on slave: ‘no error’ (0)………………
    [ERROR] Error running query, slave SQL thread aborted

    BTW, I just reported the DEADLOCK problem here: http://bugs.mysql.com/bug.php?id=32210. It’s great if you can have any advice. :-)

    safari

    8 Nov 07 at 11:43 pm

  2. Hi Baron,

    Why are you using multi-table UPDATE to do this? The problem would also be solved reliably if you did the typical thing:

    BEGIN;
    SELECT id FROM workpriority WHERE … FOR UPDATE;
    UPDATE workpriority SET … WHERE id IN (…);
    COMMIT;

    Jeremy Cole

    8 Nov 07 at 11:50 pm

  3. It’s not a bug, actually. This problem is caused by mixing updates to transactional and non-transactional tables. You cannot do this safely with replication.

    When you mix transactional and non-transactional tables, and there is a deadlock or rollback, there is no way to prevent the slave being out of sync with the master — none at all. Some changes will either occur on the master and not the slave, or vice versa. So I usually advise people never to do this.

    Xaprb

    8 Nov 07 at 11:53 pm

  4. How about the deadlock I reported? Is there any way to get rid of it?

    safari

    9 Nov 07 at 12:11 am

  5. Good that you find your bug, that’s really not something you can spot easily. We have been running replication over 6 years now, and afaik we have never hit that bug/feature. :) But I guess we don’t do much updates/deletes with ORDER BY and LIMIT in it. For us, now days out of sync replication is much more rare than it was before (say 3.23 or 4.0).

    Toba

    9 Nov 07 at 1:27 am

  6. Jeremy, that’s a good point. It never occurred to me.

    safari, deadlocks are a fact of life in transactional systems; you can sometimes avoid them by making the different processes read the same index in the same direction, but it’s usually pretty hard to do. Better just build the application to detect and retry the work.

    Xaprb

    9 Nov 07 at 8:33 am

  7. actually from a presentation of Youtube team, I learnt slaves of Mysql Replication are doomed to lag of master because replication is asynchronous and serialized.

    updates on master is concurrent and while user doing one big update on table A, other user can do updates on table B for example.

    when replicating, the update is serialized,
    For slave it read the statement from binlog in the form of

    update on tableA;
    update on tableB;
    ….

    update on tableB will not run until slave finish big update on tableA.

    This seems to be trivial on small database, but for database driven website like youtube and other web2.0 site, this will cause serious problems. That’s why they will go horizontal partition route.

    yingkuan

    22 Jan 08 at 12:19 am

Leave a Reply