Archive for the ‘Maatkit’ Category
A clarification on mk-slave-prefetch
It seems to be a popular misconception that mk-slave-prefetch is designed to keep a MySQL replica server “warm” and ready to serve production traffic in case the master is demoted or fails. This is not what mk-slave-prefetch does. It’s related, and easy to confuse, but its purpose is different.
The mk-slave-prefetch tool is designed to try to execute a read-only approximation of the write workload that the replica is about to have to perform. It is meant to do this just a little bit before the replication thread (which can only be true if replication is lagging), so that when the replica replays writes to execute replication, it doesn’t have to wait for disk I/O.
Keeping caches warmed up for production traffic requires that the read workload, which does not flow through relay logs, is executed on the server. If you point mk-slave-prefetch at a server, you’re just double-executing the write workload in a read-only fashion.
Perhaps an example will clarify. Suppose that a master database has a bunch of writes to WritableTable, and lots and lots of reads to ReadableTable. On the replica, the only queries in the relay log will be on WritableTable. And what will mk-slave-prefetch be doing? Executing SELECT queries against WritableTable. No traffic on ReadableTable at all. If you switch to use the replica as the primary server, ReadableTable’s data won’t be in the caches.
To actually warm up the replica, you need to replay the queries against the ReadableTable. You can only find those by looking at the master. You can read its query logs, or watch its processlist, or capture TCP traffic, or any other method of capturing read traffic. There is a feature built into mk-query-digest to help you capture and replay these against the replica: –execute.
A parting note: mk-slave-prefetch is an extremely niche tool that generally doesn’t help replicas keep up with their masters. Only in very special cases is the hardware, data, and workload suitable for what it does.
Planned change in Maatkit & Aspersa development
I’ve just sent an email to the Maatkit discussion list to announce a planned change to how Maatkit (and Aspersa) are developed. In short, Percona plans to create a Percona Toolkit of MySQL-related utilities, as a fork of Maatkit and Aspersa. I’m very happy about this change, and I welcome your responses to that thread on the discussion list.
New Maatkit tool: mk-table-usage
This month’s Maatkit release includes a new tool that’s kind of an old tool at the same time. We wrote it a couple years ago for a client who has a very large set of tables and many queries and developers, and wants the database’s schema and queries to self-document for data-flow analysis purposes. At the time, it was called mk-table-access and was rather limited — just a few lines of code wrapped around some existing modules, with an output format that wasn’t generic enough to be broadly useful. Thus we didn’t release it with Maatkit. We recently changed the name to mk-table-usage (to match mk-index-usage), included it in the Maatkit suite of tools, and enhanced the functionality a lot.
What’s this tool good for? Well, imagine that you’re a big MySQL user and you hire a new developer. Now you need to bring the new person up to speed with your environment. Or, you want to understand where the data in some table actually comes from. Or, you want to drop a column, but you’re not sure where that data is used and what other code will be affected. Or you want to find all SQL statements that modify a table. Wouldn’t it be nice to have a graph of all your tables and the data flows between them? With this tool you can parse the flow of data in SQL statements, in terms of Table-From → Table-To, and print the results, annotated by the statement’s fingerprint.
The client who sponsored the development of this tool is using it as an auditing mechanism, for some of the purposes I just mentioned, and also to help enforce their SQL coding standards. It can be used for a lot more than that, though. I haven’t done this yet, but it should be easy to write some quick 5-line script to transform it into graphviz format and produce graphs from it, or import into a table that represents edges and run queries against it, and so on. (The client is doing some of those things, but they aren’t asking me to help, so I’m taking their word for it that the output format they chose is easily amenable to these tasks.)


