Binary log checksums in MySQL 5.6

MySQL 5.6 will have “checksums in the binary log,” which can be variously described, but one phrase I’ve heard a few times is, loosely, that it helps ensure replication integrity. This isn’t specific enough to make it clear what it does, and when I’ve talked about pt-table-checksum and its purpose (for example, on webinars), people often ask whether pt-table-checksum will be obsoleted by replication checksums in MySQL 5.6. The answer is no, they do completely different things. But it’s kind of confusing, a bit like semi-synchronous replication in that regard.

pt-table-checksum ensures that your replicas have the same logical dataset as their masters. They can drift for any number of reasons – someone changes data directly on the replica, there is an error in replication, a nondeterministic change is made on the master in STATEMENT binlog format – the list goes on. MySQL 5.6 will add many safeguards to help prevent or avoid some of these, but they are still possible. You need a tool like pt-table-checksum to verify data integrity on replicas. The server has no built-in way to do that for you.

Binary log event checksums ensure that the binary log events are transmitted without corruption when replicas connect to the master and retrieve its binary log. This prevents problems such as bit-flips in memory, bugs in the I/O thread when it reads the log events and writes them to the relay log, network corruption, and so forth. It does not verify that the data that’s changed by the binary log event will match the changes on the master.

I’m really happy with the binary log checksum feature, and glad that it’s enabled by default. I have fixed many replication problems caused by binary logs being transmitted to a relay log incorrectly. Preventing them from happening in the first place, or detecting when they do and halting replication, is a great enhancement. By the way, I requested this feature, so, thanks Oracle!

I'm Baron Schwartz, the founder and CEO of VividCortex. I am the author of High Performance MySQL and lots of open-source software for performance analysis, monitoring, and system administration. I contribute to various database communities such as Oracle, PostgreSQL, Redis and MongoDB. More about me.