Archive for the ‘SQL’ tag
This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield’s relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I’m releasing it as part of Maatkit.
Changelog for mk-archiver: 2008-01-24: version 1.0.7 * Added --quiet option. * Added --plugin option. The plugin interface is not backwards compatible. * Added --bulkins option. * Added --bulkdel option. * Added --nodelete option. * Changed negatable --ascend option to --noascend. Changelog for mk-parallel-dump: 2008-01-24: version 1.0.5 * The fix for bug #1863949 added an invalid argument to gzip (bug #1866137) * --quiet caused a crash. Changelog for mk-parallel-restore: 2008-01-24: version 1.0.4 * The -D option was used as a default DB for the connection (bug #1870415). Changelog for mk-slave-prefetch: 2008-01-24: version 1.0.0 * Initial release. Changelog for mk-table-sync: 2008-01-24: version 1.0.4 * Made the --algorithm option case-insensitive (bug #1873152). * Fixed a quoting bug. * Made the UTF-8 options configurable.
I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That’s what happens when I write quickly! This post explains what the heck I was talking about.
I first heard Paul Tuckfield talk at the first MySQL Camp, in November 2006. He mentioned that he speeds up MySQL replication by “pre-fetching relay logs” on the slave. Actually, I think he used the term “pipelining” at that point. Next Spring, he mentioned the same thing in his keynote address at the 2007 MySQL Conference & Expo. You can download audio and video of his talk from that link. He mentions this approach pretty late in the talk, almost at the end. I’ve been meaning to try duplicating his idea since the first time I heard him talk.
The basic idea is to help overcome replication’s single-threadedness. Under the right conditions, the slave’s SQL thread can become I/O-bound, even though the slave server has lots of unused I/O capacity. As a result, it spends a lot of time just waiting for the disk to return some data, and becomes much slower than it has to be.
Paul’s solution to this problem is to read the statements from the relay log, just a little bit ahead of the SQL thread’s position, convert them into SELECT queries, and execute them on the slave. This causes MySQL to request some of the data from the disk in advance. Then when the slave’s SQL thread wants to update that data, it’s already in memory, and things can potentially go much faster.
How much faster is open to debate. I think Paul sees about 3-4 times faster, but please don’t quote me on that. Farhan Mashraqui also uses this hack and gets some speedup as well.
The problem is, it won’t automatically work for everyone. In theory, it can potentially help if the following are true:
- Your data is much bigger than memory.
- You use a storage engine with row-level locking, like InnoDB.
- Your workload is mostly small (single-row is good), widely scattered, random UPDATE and DELETE statements. (INSERT is less likely to benefit, because the relevant indexes are likely to be “hot” already).
- The slave’s SQL thread is I/O-bound, but the slave has lots of spare I/O capacity. In other words, lots of disk spindles.
My slaves don’t do this kind of work. They do a lot of big multi-table updates and summary queries. There is very little to gain from pre-fetching the indexes and data for these statements, because whatever big query the SQL thread is running is likely to just flush the pre-fetched pages out of memory again before they’re needed. I tried anyway, and sure enough, it didn’t work.
The other problem is, it’s hard to write a generically useful program to do this kind of pre-fetching. It’s not too hard to write something specific to your workload, as Farhan did. But getting it to work right in general requires a lot of smarts, such as figuring out how far ahead of the slave SQL thread to stay, which queries not to try to pre-execute, and so on. I wrote an implementation of it that’s generic and has some intelligence built in. If you’re interested in it, see my previous post (linked at the top of this post).
If you’re thinking about writing something like this yourself, be prepared: it could be a lot of work. I can see how it would be simpler on some workloads, but on mine it was far from simple. I did some silly things, like running out of disk space because of temp files for LOAD DATA INFILE statements. Fortunately, that was just one of my benchmark machines.
If conditions aren’t right, it could really screw you. For example, if your slave has only a single disk, or if you use MyISAM on the slave, you’ll probably just cause problems for yourself. You need to know your hardware and your workload really well. That’s why Paul didn’t release his code, and I’ve hesitated for the same reason.
There’s more information about this in the upcoming High Performance MySQL, 2nd Edition, which I’m helping to write. We also have a lot more information about how to understand your hardware and workload. There’s no way I can fit it all into this post, and I don’t want to try. Even if I weren’t working like a mad dog on the book and had time to put it here, I can’t give away all the book’s goodies, can I? :-)
Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave’s relay logs. I wrote an implementation of this, tried it on my workload, and it didn’t speed them up. (I didn’t expect it to; I don’t have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn’t going to benefit from the pre-fetching.
In the meantime, I’ve got a pretty sophisticated implementation of Paul’s idea just sitting around, unused. I haven’t released it for the same reasons Paul didn’t release his: I’m afraid it might do more harm than good.
However, if you’d like the code, send me an email at [baron at this domain] and I’ll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.
Update: it’s part of Maatkit now.