MySQL Table Checksum 1.1.0 released

Download MySQL Table Checksum

MySQL Table Checksum 1.1.0 adds many improvements, but the most important is a new way to ensure slaves have the same data as their master. Instead of checksumming the slave and the master, it can now insert the checksum results directly on the master via an INSERT.. SELECT statement. This statement will replicate to the slave, where a simple query can find tables that differ from the master. This makes a consistent, lock-free checksum trivially easy.

There are also many other feature improvements and bug fixes, compatibility with MySQL 3.23.2 through 6.0-alpha, and I’ve finally gotten the documentation finished to my satisfaction. Plus I wrote a test suite.

The new checksum method

While I was at last week’s conference, Martin Friebe and I went to supper and he suggested doing checksums via INSERT.. SELECT to avoid all the hassles with locking, waiting for slaves to catch up, and so on. This method guarantees checksums are taken at the same point in replication on the master and slave, and requires no locking or waiting. It’s a much better way to verify your slaves are in sync with the master.

The technique is implemented in two ways: with a variable accumulator (a modified version of the original checksum queries I wrote about in previous articles), and with a BIT_XOR checksum similar to the checksum queries used by MySQL Table Sync. Each method has strengths and weaknesses.

It’s now practical — nay, easy — to run MySQL Table Checksum from a cron job, and just as easy to check whether a slave is out of sync with the master. First, you create a table to hold the checksums:

  CREATE TABLE test.checksum (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     this_crc   char(40)     NOT NULL,
     this_cnt   int unsigned NOT NULL,
     master_crc char(40)         NULL,
     master_cnt int unsigned     NULL,
     ts         timestamp    NOT NULL,
     PRIMARY KEY (db,tbl)
  );

Then you run MySQL Table Checksum:

mysql-table-checksum --replicate=test.checksum master-hostname

When it’s all done, you check the slave:

SELECT db, tbl, this_cnt-master_cnt AS cnt_diff,
     this_crc <> master_crc AS crc_diff
FROM checksum
WHERE master_cnt <> this_cnt OR master_crc <> this_crc;

That’s it! Is that easy to throw into Nagios, or what?

Technorati Tags:No Tags

You might also like:

  1. How to know if a MySQL slave is identical to its master
  2. Introducing MySQL Table Checksum
  3. How to sync tables in master-master MySQL replication
  4. How to calculate table checksums in MySQL
  5. How MySQL replication got out of sync

9 Responses to “MySQL Table Checksum 1.1.0 released”


  1. 1 Jaap Taal

    I’m trying to synchronise several databases. The main database is empty but is used as a reference for the table layout. Each of the database should be checked to see if new tables should be added, columns should be added or changed (renaming a column will not work though).
    This is a manual process (which depends on the version of the software), so I don’t want MySQL to handle it.

    Do you know of an algoritm (or program) that does this efficiently on a linux platform? Currently my script connects remotely to the server and executes DESCRIBE for each table, but this is getting very slow. I was wandering, can you generate a hash for the table layout on the client side and server side and compare them???

  2. 2 Francois Saint-Jacques

    Have you done extensive performance testing? I’m reading your past article about mysql table checksum and you’re talking about a possible ‘overhead’. Let say we have a 20G innodb table, I wonder how much time it take for the drive to melt down to death ;)

  3. 3 Xaprb

    Hi Jaap, I think mysqldiff will probably help you, though I have never used it myself.

  4. 4 Xaprb

    Francois, it performs acceptably for me (that is a relative term of course) on larger data sets than that. I will time it and write another article with numbers soon. Thank you for the suggestion.

    I’m sure if MySQL would build something into the server, it would be more efficient than anything I can do outside the server. Until they do that, your options are fairly limited (understatement of the month, haha).

  5. 5 zhur

    Baron, what do you think about calculating checksum of big MyISAM tables by chunks?
    For example, we have table persons with primary key (id).
    1. Let's determine table characteristics.
    mysql> select min(id), max(id), count(*) from persons;
    ——— ———- ———-
    | min(id) | max(id) | count(*) |
    ——— ———- ———-
    | 25 | 12736201 | 5417799 |
    ——— ———- ———-

    2. Now let’s calculate checksums for chunks with replication. Also we can sleep between chunks for minimize locking time.
    STEP = ( 12736201 - 25 ) / CHUNKS_NUM
    chunk #0: id 12736201
    3. Now we can compare checksums for all chunks and find differences.

    This method can produce “false positives” errors, but very rarely(when we update primary key) and can’t work for tables without primary/unique keys, but don’t produce long locks - it’s very important for some applications.

    P.S. Sorry for my english ;-)

  6. 6 Xaprb

    Hi! I think this is important. Another person has mentioned it as well. I would rather discuss it on the mysqltoolkit-discuss mailing list, but blog comments are okay as well.

    First, initial tests show that COUNT(*) is not much faster than the checksums, at least on InnoDB tables (I will publish these results soon). So this strategy would not be a good one for all scenarios. Let’s say we just omit the COUNT(*) part and select the MIN() and MAX(), which can be satisfied quickly from indexes. Then we divide this into ranges and repeatedly query, possibly with a sleep, as you said. I think rather than just divide into a fixed number of ranges, the DBA should specify the maximum number of rows to checksum at once, and the tool can take care of dividing into ranges with BETWEEN.

    This has the disadvantage that the ranges might not be the same size and some of them might even be empty if there are holes in the sequence, but I think it is practical.

    This is an important feature I’ll work on soon, but I’m not sure how soon — some people have requested features for MySQL Table Sync that are also important. You are welcome to help if you wish. If you know Perl the code is relatively straightforward. Patches are welcome.

  7. 7 Xaprb

    This is done. I will release this code soon, but for right now you can get it from the latest trunk in SVN. I hope you get a chance to test it and report any bugs you find.

  8. 8 Luis Torres

    Question here:

    If tables in the master get modified while mysql-table-checksum is running then this won’t work, or will it?

    In other words, do my master and slave have to be just sitting there accepting no connections for this to work? Or just one part has to be stopped (replication itself maybe).

  9. 9 Xaprb

    No, it’ll work fine with replication running.

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the Sourceforge forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead. I'm writing a book and my time is extremely limited :-)