How to check MySQL replication integrity continually
I have recently added some features to Maatkit’s mk-table-checksum tool that can make it easy to checksum the relevant parts of your data more frequently (i.e. continually, but not continuously). This in turn makes it possible for you to find out much sooner if a slave becomes different from its master, and then you can take action before the differences affect more of your data.
The new features let you apply your knowledge of your data to the checksumming operation. For example, if table X is append-only, then you can make it checksum only the new rows (those inserted since the last checksum). If the table is MyISAM, then it has a last-modified timestamp, which comes from the .MYI file’s timestamp — that’s another easy way to find out whether the table has changed since the last time you ran the checksum.
The new features also let you checksum only part of the data. For example, you can checksum 1/7th of your data each day, which is pretty likely to catch differences if they’re widespread. And just in case they’re localized, you can checksum a different portion each day so you cover the whole dataset each week.
Finally, you can apply these new options per-table, so some tables get checksummed one way and some another way. In fact, there’s now a mechanism for you to specify most options per-table, so you can specify the checksum algorithm, sleep time, chunksize, and a bunch of other things.
Let’s take a look at the new features.
Checksumming only part of the data
Suppose you want a rough idea of whether your data is really different on a slave. Maybe you’re a consultant who needs to check a really big data set to see if there’s cause for concern. (Hmm, this sounds familiar, almost as though… nevermind.) One way to do that is to checksum a random sample of the rows. Let’s checksum 5% of the data:
mk-table-checksum --replicate test.checksum --chunksize 1000 --probability 5 localhost
When this completes, you can check the results on the slave.
While this is handy, it’s not what you need if you’re trying to set up a routine job to check all your data on an ongoing basis. You want complete coverage over some period of time. That’s what --modulo and --offset are for. Let’s do 1/7th of the data every day, to achieve full coverage over the course of a week:
mk-table-checksum --replicate test.checksum --chunksize 1000 \ --modulo 7 --offset 'WEEKDAY(NOW())' localhost
Notice that I’m passing a SQL expression to the --offset option. There’s a little bit of magic here. The tool puts SELECT in front of this and executes it in MySQL; if there’s no error, then the result of the SELECT is used as the option’s argument.
Checksumming only changed data
The final new feature is the ability to checksum based on newness.
Scenario 1 is tables that don’t change often, so they can be skipped entirely. Let’s skip things that haven’t changed in the last week:
mk-table-checksum --replicate test.checksum --chunksize 1000 \ --since 'CURRENT_DATE - INTERVAL 7 DAY' localhost
Again, the argument is an expression that MySQL can evaluate. (You could also write clever things here, such as selecting from an actual table.) If the table has an Update_time timestamp, and the value of --since looks like a temporal value, then the two are compared and the table can be skipped or checksummed, based on that comparison.
But there’s another scenario. The --since value can also be the value of the table’s primary key (actually, the column by which the table is chunked, but the primary key is usually preferred for that.) Suppose you have an auto_increment primary key column named ID. Last time you checksummed, the maximum value in that column was 123,456. This time we can checksum only newer rows:
mk-table-checksum --replicate test.checksum --chunksize 1000 \ --tables mydb.mytbl --since 123456 localhost
The table will be divided into chunks of 1000 rows based on the range of values in the ID column, and the --since value will be applied as an extra WHERE clause mentioning that column. You can also use the --sincecol option to specify which column to apply the WHERE clause to, if necessary.
Putting it all together
So far these options are useful, but of course you don’t want to checksum your servers one table at a time with an option here, an option there, and so on. Let’s see how to simplify this so you can just run the tool and make it Do The Right Thing on every table.
This is the purpose of the --argtable argument, which lets you specify per-table options for the checksum operation. You create a table — I’ll use checksum_args, but you can call it whatever you like. What’s important is the columns in the table:
CREATE TABLE checksum_args (
db char(64) NOT NULL,
tbl char(64) NOT NULL,
-- other columns as desired
PRIMARY KEY (db, tbl)
);
You pass the name of this table to the tool:
mk-table-checksum --argtable mydb.checksum_args [other options....]
Now you can add columns to the table, named the same as the short form of some of mk-table-checksum’s options. For example, if you want mydb.mytbl to be checksummed in chunks of 500 rows, add a C column to the table. (The short form of --chunksize is -C.) Now insert a row into the table:
insert into checksum_args(db, tbl, C) values('mydb', 'mytbl', 500);
You can also add a since column to this table. And finally, you can use --savesince to specify whether to save the last-used --since back into the table after the checksum operation. This way the value persists from one run to the next. For tables that are skipped based on the timestamp of the table (instead of the biggest known value of the chunk column), the current timestamp is saved instead.
You can also add the --modulo and --offset into the table.
insert into checksum_args(db, tbl, C, M, O)
values('mydb', 'mytbl', 500, 7, 'DAYOFWEEK(NOW())');
Conclusion
So that’s a quick overview of the new features, which open up a range of new possibilities for frequent checksumming of data. Note that frequent isn’t the same as continuous (which is also not the same thing as continual, so don’t say the title of this post is false advertising). Continuous verification that a slave is in sync with the master requires some support in the server. However, you could easily checksum the newest rows in certain tables as often as you want, like every minute. In fact it’s probably a good idea to do checksums more often, and in smaller nibbles, with the new features I’ve explained here. You can ease the workload that way — spread it over time.
The features are still evolving, and the newest code in the Subversion trunk is what you should probably look at if you’re interested in using them. (Some of them aren’t completed in the last release.) If you have suggestions or find bugs, please use the Maatkit Google Code project to communicate them to the dev team.
But most of all, enjoy and profit from this work, and spread the word about Maatkit!
Note: In the spring of 2009, there was a major effort to clean up and standardize Maatkit command-line options, so the specific command-line options and column names mentioned in this blog post are no longer valid. Thanks to Sheeri for pointing these out. For example, use chunk-size instead of C, because the -C option went away and –chunk-size took its place.

If the table is MyISAM, you can also keep a constant running checksum with the MySQL built-in table option CHECKSUM=1. That puts a checksum into the CHECKSUM field of INFORMATION_SCHEMA.TABLES, I believe whenever the table is modified.
The title is a bit misleading, you mean “incrementally”, not “continually” — continually makes it sound like there’s a feature that works like MySQL’s CHECKSUM=1 table option for MyISAM tables (5.0 and up). (unless I’m misunderstanding, it’s not continual, just incremental, right?)
Sheeri
4 Oct 08 at 5:07 pm
Heh, I wondered if I’d get into a debate about the meaning of continual :) No, I think the title is correct, though it is linkbait.
My point about this is to do continual master/slave checks. You can’t do consistent checksums via replication with CHECKSUM TABLE.
Hmmm. Unless you use the INFORMATION_SCHEMA and have CHECKSUM=1 enabled as you point out for MyISAM tables. Then you could, in fact… that would be a useful feature for mk-table-checksum too.
Xaprb
4 Oct 08 at 6:49 pm
Sheeri, I don’t believe that MyISAM checksum is much use unless you’d stop updating on the master for a few seconds before and while you check, because otherwise it’s very likely to be different between master and slave anyway?
Arjen Lentz
5 Oct 08 at 6:48 am
Hi
currently I’m using a check_table updated every minute or so with the current timestamp. Then I check in the slave if the checksum of that table is correct. So I can’t understand completely this improvement,I mean, what are the drawbacks of not using a dedicated check table for this task as I do? I can’t figure a situation where you cannot create a table like this…
I’m not complaining about these new features, just asking why you implemented them :)
Thanks in advance
Vide
5 Oct 08 at 2:21 pm
A side note about MySQL internal checksums is they don’t work with FLOAT columns:
http://bugs.mysql.com/bug.php?id=30041
Baron, out of curiosity have you changed your checksums to work around this?
Morgan Tocker
5 Oct 08 at 7:46 pm
Morgan, yes, there’s some option for digits of precision. There are also options for trimming whitespace, etc (so you can compare 4.0 to 5.0 servers, for example.)
Xaprb
5 Oct 08 at 8:09 pm
hi Vide,
Please explain what you mean by a check table.
Xaprb
5 Oct 08 at 9:25 pm
CREATE TABLE `SYS_check_sync` (
`CurTime` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
) ENGINE=InnoDB
with the “CurTime” field updated every minute on the master. Then with mk-table-checksum I check that the table checksum on master/slaves is the same. If not, then I send a warning and block connections to the out-of-sync slave (we cannot tolerate delays in our environment). This way is very quick because the table is really really tiny.
Vide
6 Oct 08 at 3:20 am
Vide, it’s not the same thing at all. Checking that a timestamp in one table is the same from master to slave doesn’t say anything about the rest of your data. Data you don’t checksum directly is an unknown quantity. At best, you’re checking for replication lag. Lagging != out-of-sync.
You should be using mk-heartbeat anyway to check replication lag.
Your method of checking provides you no useful check at all, really. Unless I’m terribly misunderstanding you.
Xaprb
6 Oct 08 at 9:55 am
Baron, well, I’ve just checked mk-heartbeat and in fact it does more or less the same I’m doing “manually”. Anyway if that check table isn’t updated than it means that replication is broken, because Mysql stops immediately the slave if it encounters an error.. I mean, it’s just a “global” sanity check. If this table isn’t updated, then I assume replication is b0rked. Otherwise, replication is working as expected… or at least this is what i was thinking, so here my comments.
Obviously my method doesn’t check if data is really synced (there could be logical errors that doesn’t break replication)… so now I understand what these new features are for :)
Vide
6 Oct 08 at 10:07 am
[...] just wrote about how Maatkit has gotten some new features for helping people make sure their slaves have an exact copy of the master’s data. This work [...]
Maatkit thanks Vantage Media at Xaprb
9 Oct 08 at 3:31 pm
I really love this, and the high performance website! I wish that 5 years ago, when I started at MySQL that I had a time machine portal into these sites to get up to speed with internal dev … oh, the bugs I could’ve fixed and how fast I could’ve come up to speed. Also, a time machine to fetch the book you guys wrote…
CaptTofu
9 Oct 08 at 4:31 pm
Thanks!
Xaprb
9 Oct 08 at 4:47 pm
[...] newest release of Maatkit contains mk-audit bug fixes, lots of new features for mk-table-checksum so you can check replication more frequently (see my earlier post…, and more bug fixes for other [...]
Maatkit version 2442 released at Xaprb
18 Oct 08 at 5:50 pm
Baron, feel free to delete this comment if you like, when the post is updated.
These notes are for revision 3710
–chunksize is not a valid option, but –chunk-size is valid
-C does not work as an option to mk-table-checksum
“chunksize” and “C” and “chunk_size” are not allowable arg-table columns.
`chunk-size` is an allowable arg-table column, but it does not seem to function properly; probably because MySQL will interpret it as “chunk”-”size” unless enclosed by backtics.
Also, I made a checksum_args table, and did not put all the tables in the database there:
CREATE TABLE `checksum_args` (
`db` char(64) NOT NULL,
`tbl` char(64) NOT NULL,
`ts` datetime default NULL,
`since` varchar(35) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
mysql> insert into checksum_args (db,tbl) SELECT TABLE_SCHEMA,TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE
_SCHEMA=’phpwiki’ and TABLE_TYPE=’BASE TABLE’;
Query OK, 10 rows affected (0.01 sec)
Records: 10 Duplicates: 0 Warnings: 0
But when I ran the checksum, it was going through all the tables in all the databases.
Sheeri K. Cabral
19 Jun 09 at 6:44 pm
Sheeri, the checksum_args only specifies arguments to apply to specific tables. It’s basically an override table. It doesn’t specify that only the given tables will be checksummed. We’ll update the docs to clarify this. I think (but I lost track) that you also mentioned the lack of quoting with chunk-size on an issue you entered?
Xaprb
2 Jul 09 at 11:21 am