Kevin Burton wrote recently about why
SHOW SLAVE STATUS is really not a good way to monitor how far behind your replica servers are, and how replica network timeouts can mess up the replica lag. I’d like to chime in and say this is exactly why I thought Jeremy Cole’s MySQL Heartbeat script was such a natural fit for the MySQL Toolkit. It measures replica lag in a “show me the money” way: it looks for the effects of up-to-date replication, rather than asking the replica how far behind it thinks it is.
The replica doesn’t even need to be running. In fact, the tool doesn’t use
SHOW SLAVE STATUS at all. This has lots of advantages: for example, it tells you how far the replica lags behind the ultimate master, no matter how deep in the replication daisy-chain it is. In other words, unlike
SHOW SLAVE STATUS, it won’t tell you a replica is up-to-date just because it’s caught up to its master. If a replica’s master is an hour behind, it will report that the replica is an hour behind, too—because it is.
It’s a really smart approach. And you can daemonize it, and it’ll keep a file up-to-date with running averages (by default it averages the last one, five and fifteen minutes, but of course you can choose that). Now your monitoring scripts can be as simple as “cat /var/log/replica-delay” or some such.
It’s not a hard tool to write, and I suspect lots of people have done it, but I bet that between Jeremy, whoever worked on it at Six Apart, and me, we’ve produced a pretty good version of the tool. It’s part of the MySQL Toolkit, and the full manual is online.