Jason Hull of OpenSource Connections, a company in my town, posted an article on what Sun’s acquisition of MySQL means for the US government. I thought Planet MySQL readers might appreciate a different angle on the issue than many of the Planet MySQL posts, which are often focused on business or community more than government. (I’m just passing the link along, not agreeing or disagreeing).
Technorati Tags:government, Jason Hull, mysql, SunArchive for January, 2008
I’ll be attending the 2008 MySQL Conference and Expo again this year, and I’m looking forward to hearing some great sessions, meeting new and old friends, and giving sessions myself. As a proposal reviewer, I looked at and voted on 250+ proposals for sessions and tutorials for this conference. There are going to be some great sessions and tutorials.[1]
If you haven’t come to the conference previously, it’s well worth your time and money, in my opinion.
I (Baron Schwartz) am giving two sessions myself, on extremely practical topics. One is the query cache, and the other is EXPLAIN. Both are the subject of many myths and misunderstandings! My goal is to remove all the programmer-speak and show you how they really work. Once you understand that, you can understand the technical terminology. (But it’s very hard to go the other direction).
I haven’t decided yet which sessions I want to attend, but I know this: I’m not going to miss seeing how Beat Vontobel solves a Su Doku puzzle with only self-joins. His session on views last year was just amazing.
Hopefully there’ll be plenty of time to sit down for meals and chats with all the people I correspond with throughout the year, but rarely get to see or talk to!
1And no, I don’t get any kickback for saying nice things about the conference. Even reviewing all those proposals was a volunteer job. And Jay Pipes tricked me into it, the rat! He told me it would be only a few hours. Haha, you can’t review 250 proposals in a few hours… I have to say though, some of them were really rewarding to read. One of them was about holding a cosmic prayer circle or something like that. Without expressing any opinion on my religion/spirituality, I did have to vote NO on that one — sorry, wrong conference.
Technorati Tags:Beat Vontobel, conference, explain, Jay Pipes, mysql, mysqlconf2008, mysqluc2008, query cacheYou might also like:
This release contains bug fixes and new features. It also contains a new tool: my implementation of Paul Tuckfield’s relay log pipelining idea. I have had quite a few responses to that blog post, and requests for the code. So I’m releasing it as part of Maatkit.
Changelog for mk-archiver: 2008-01-24: version 1.0.7 * Added --quiet option. * Added --plugin option. The plugin interface is not backwards compatible. * Added --bulkins option. * Added --bulkdel option. * Added --nodelete option. * Changed negatable --ascend option to --noascend. Changelog for mk-parallel-dump: 2008-01-24: version 1.0.5 * The fix for bug #1863949 added an invalid argument to gzip (bug #1866137) * --quiet caused a crash. Changelog for mk-parallel-restore: 2008-01-24: version 1.0.4 * The -D option was used as a default DB for the connection (bug #1870415). Changelog for mk-slave-prefetch: 2008-01-24: version 1.0.0 * Initial release. Changelog for mk-table-sync: 2008-01-24: version 1.0.4 * Made the --algorithm option case-insensitive (bug #1873152). * Fixed a quoting bug. * Made the UTF-8 options configurable.Technorati Tags:archiving, mysql, Paul Tuckfield, replication, sql
You might also like:
I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That’s what happens when I write quickly! This post explains what the heck I was talking about.
I first heard Paul Tuckfield talk at the first MySQL Camp, in November 2006. He mentioned that he speeds up MySQL replication by “pre-fetching relay logs” on the slave. Actually, I think he used the term “pipelining” at that point. Next Spring, he mentioned the same thing in his keynote address at the 2007 MySQL Conference & Expo. You can download audio and video of his talk from that link. He mentions this approach pretty late in the talk, almost at the end. I’ve been meaning to try duplicating his idea since the first time I heard him talk.
The basic idea is to help overcome replication’s single-threadedness. Under the right conditions, the slave’s SQL thread can become I/O-bound, even though the slave server has lots of unused I/O capacity. As a result, it spends a lot of time just waiting for the disk to return some data, and becomes much slower than it has to be.
Paul’s solution to this problem is to read the statements from the relay log, just a little bit ahead of the SQL thread’s position, convert them into SELECT queries, and execute them on the slave. This causes MySQL to request some of the data from the disk in advance. Then when the slave’s SQL thread wants to update that data, it’s already in memory, and things can potentially go much faster.
How much faster is open to debate. I think Paul sees about 3-4 times faster, but please don’t quote me on that. Farhan Mashraqui also uses this hack and gets some speedup as well.
The problem is, it won’t automatically work for everyone. In theory, it can potentially help if the following are true:
- Your data is much bigger than memory.
- You use a storage engine with row-level locking, like InnoDB.
- Your workload is mostly small (single-row is good), widely scattered, random UPDATE and DELETE statements. (INSERT is less likely to benefit, because the relevant indexes are likely to be “hot” already).
- The slave’s SQL thread is I/O-bound, but the slave has lots of spare I/O capacity. In other words, lots of disk spindles.
My slaves don’t do this kind of work. They do a lot of big multi-table updates and summary queries. There is very little to gain from pre-fetching the indexes and data for these statements, because whatever big query the SQL thread is running is likely to just flush the pre-fetched pages out of memory again before they’re needed. I tried anyway, and sure enough, it didn’t work.
The other problem is, it’s hard to write a generically useful program to do this kind of pre-fetching. It’s not too hard to write something specific to your workload, as Farhan did. But getting it to work right in general requires a lot of smarts, such as figuring out how far ahead of the slave SQL thread to stay, which queries not to try to pre-execute, and so on. I wrote an implementation of it that’s generic and has some intelligence built in. If you’re interested in it, see my previous post (linked at the top of this post).
If you’re thinking about writing something like this yourself, be prepared: it could be a lot of work. I can see how it would be simpler on some workloads, but on mine it was far from simple. I did some silly things, like running out of disk space because of temp files for LOAD DATA INFILE statements. Fortunately, that was just one of my benchmark machines.
If conditions aren’t right, it could really screw you. For example, if your slave has only a single disk, or if you use MyISAM on the slave, you’ll probably just cause problems for yourself. You need to know your hardware and your workload really well. That’s why Paul didn’t release his code, and I’ve hesitated for the same reason.
There’s more information about this in the upcoming High Performance MySQL, 2nd Edition, which I’m helping to write. We also have a lot more information about how to understand your hardware and workload. There’s no way I can fit it all into this post, and I don’t want to try. Even if I weren’t working like a mad dog on the book and had time to put it here, I can’t give away all the book’s goodies, can I? :-)
Technorati Tags:Farhan Mashraqui, mysql, Paul Tuckfield, replication, sqlYou might also like:
Paul Tuckfield of YouTube has spoken about how he sped up his slaves by pre-fetching the slave’s relay logs. I wrote an implementation of this, tried it on my workload, and it didn’t speed them up. (I didn’t expect it to; I don’t have the right workload). I had a few email exchanges with Paul and some other experts on the topic and we agreed my workload isn’t going to benefit from the pre-fetching.
In the meantime, I’ve got a pretty sophisticated implementation of Paul’s idea just sitting around, unused. I haven’t released it for the same reasons Paul didn’t release his: I’m afraid it might do more harm than good.
However, if you’d like the code, send me an email at [baron at this domain] and I’ll share the code with you. In return, I would like you to tell me about your hardware and your workload, and to do at least some rudimentary benchmarks to show whether it works or not on your workload. If I find that this is beneficial for some people, I may go ahead and release the code as part of Maatkit.
Update: it’s part of Maatkit now.
Technorati Tags:mysql, Paul Tuckfield, replication, sqlYou might also like:
Whew! I just finished a marathon of revisions. It’s been a while since I posted about our progress, so here’s an update for the curious readers.
I just finished revising the last two major chapters that Peter Zaitsev hasn’t yet reviewed. Peter has been essentially going through the chapters like a very thorough technical reviewer. He makes corrections, points out where things aren’t clear or need examples, and adds more material.
By “finished revising,” I mean finished expanding the outline into a full chapter. We’re still working at the level of “this chapter is mostly there, but we might decide to revise it more.” We will most certainly do so in many cases. There are some chunks of material that I’ve marked TODO to put into other chapters, for example. We’re not at the level of a final draft with any chapter except the chapter on MySQL’s architecture, but we’re getting close with the others now.
Most of the chapters are in tech review now, and we’ve gotten a few of them back. The comments from the reviewers have been very helpful. We expanded the Replication chapter quite a bit after tech review. (And then Peter reviewed it and we expanded it even more). When the tech reviewers return comments on the other chapters, we’ll revise some more.
We’re up to 529 pages in OpenOffice.org now. At my calculated ratio of 1 page = 1.1 pages in print, that’s about 582 pages in print. And that’s not counting the Replication chapter, which doesn’t have all of its illustrations yet. I predicted we’d break 500 pages; we might get close to 600. These are very, very densely written, too. No offense to the first edition, but the tone is quite different; much less light-hearted banter, much more compressed information. Peter is a walking encyclopedia, and never seems to run out of details we really ought to include because they’re important (and they are).
We may, or may not, go to production in the next few weeks. Regardless, I think we’re still on track to have the book on shelves by the MySQL Conference & Expo in April. Look for me there. I’ll be easy to find: I’ll be the tall guy with a permanent silly grin. (You’d grin too if you finished writing a book that’s been this much work!)
I’ve posted rough outlines for many of the other chapters. The two Peter and I just finished working on are the Scaling/HA/Load-Balancing/Failover chapter, and the Application-Level Optimization chapter. The Scaling/HA chapter is pretty long and very involved, and goes into a lot of detail on scaling in particular, especially horizontal scaling via sharding. (We use “sharding” because it’s less confusing than calling it “partitioning,” which already means too many different things in databases).
The Application-Level Optimization chapter is a little shorter. It’s mostly about caching strategies, how to make a web server run well, and so on. These aren’t what the book focuses on directly, but you can either help or hurt the database server a lot with your application design. Our goal here is to help people avoid the common mistakes.
For the curious, here’s the current outline for these two chapters:
Scaling and High Availability
Terminology
Scaling MySQL
Planning for Scalability
Buying Time Before Scaling
Scaling Up
Scaling Out
Functional Partitioning
Data Sharding
Choosing a Partitioning Key
Multiple Partitioning Keys
Querying Across Shards
Allocating Data, Shards, and Nodes
Arranging Shards on Nodes
Fixed Allocation
Dynamic Allocation
Mixing Dynamic and Fixed Allocation
Explicit Allocation
Sidebar: Re-Balancing Shards
Tools for Sharding
Scaling Back
Keeping Active Data Separate
Scaling by Clustering
Clustering
Federation
Load Balancing
Connecting Directly
Splitting Reads and Writes in Replication
Changing Application Configuration
Changing DNS Names
Moving IP Addresses
Introducing a Middleman
MySQL Proxy
Load Balancers
Load Balancing Algorithms
Adding and Removing Servers in the Pool
Load Balancing with a Master and Multiple Slaves
High Availability
Planning for High Availability
Adding Redundancy
Shared-Storage Architectures
Replicated-Disk Architectures
Synchronous MySQL Replication
Failover and Failback
Promoting a Slave or Switching Roles
Virtual IP Addresses or IP Takeover
MySQL Master-Master Replication Manager
Middleman Solutions
Handling Failover in the Application
And here’s the outline for the Application-Level Optimization chapter:
Application-Level Optimization
Application Performance Overview
Find the Source of the Problem
Look for Common Problems
Web Server Issues
Finding the Optimal Concurrency
Caching
Sidebar: Caching Doesn't Always Help
Caching Below the Application
Application-Level Caching
Cache Control Policies
Cache Object Hierarchies
Pre-Generating Content
Extending MySQL
Alternatives to MySQL
The thing that makes me the happiest right now is that we’re clearly going to make it. For a while, there was just so much work left to do that it was impossible to estimate how much. (Ask my wife: I was wrong many times when she asked how long it would take me to finish a chapter). I also didn’t know how much revision would be necessary, which is very scary; revising takes about four times as long as writing a first draft, by my reckoning. At this point, the remaining work is much smaller, and much easier to estimate. And now I no longer flip-flop daily between “I think we can, I think we can” and “please don’t ask, because I don’t know and I want a vacation.”
Subversion shows me that Peter has the Security chapter locked right now. This one is not a huge one, and Arjen Lentz has already reviewed it as well, so I don’t expect it to be a huge amount of work to revise. After that, it’s minor chapters and appendices. (We might actually convert the chapters on Server Status and Tools into appendices, since they got cannibalized when we realized their material fit better elsewhere. They also don’t have a very chapter-ish feel; they feel more like appendices). We’ve added a few more appendices, including one on EXPLAIN and one on debugging server and storage-engine locking problems. These are all great reference material.
See you at the conference in April!
Technorati Tags:Arjen Lentz, high availability, load balancing, mysql, OpenOffice.org, optimization, performance, Peter Zaitsev, replication, scaling, sharding, sql, technical review, writingYou might also like:
- High Performance MySQL, Second Edition: Replication, Scaling and High Availability
- Progress report on High Performance MySQL, Second Edition
- Progress on High Performance MySQL, Second Edition
- High Performance MySQL, Second Edition: Advanced SQL Functionality
- High Performance MySQL, Second Edition: Backup and Recovery
… I’m almost insulted! Look at this!

No related posts.
Maatkit contains essential command-line utilities for MySQL, such as a table checksum tool and query profiler. It provides missing features such as checking slaves for data consistency, with emphasis on quality and scriptability.
This release contains bug fixes and new features.
Changelog for mk-archiver: 2008-01-05: version 1.0.6 * Made suffixes for time options optional (bug #1858696). Changelog for mk-deadlock-logger: 2008-01-05: version 1.0.8 * Made suffixes for time options optional (bug #1858696). Changelog for mk-heartbeat: 2008-01-05: version 1.0.6 * Made suffixes for time options optional (bug #1858696). Changelog for mk-parallel-dump: 2008-01-05: version 1.0.4 * Second and later chunks had DROP/CREATE TABLE (bug #1863949). * Made suffixes for time options optional (bug #1858696). * --locktables didn't disable --flushlock. Changelog for mk-parallel-restore: 2008-01-05: version 1.0.3 * Made suffixes for time options optional (bug #1858696). * --ignoretables was ignored. Changelog for mk-slave-delay: 2008-01-05: version 1.0.5 * Made suffixes for time options optional (bug #1858696). * The program was ignoring some connection parameters. * Made the program use master when the I/O thread waits for relay log space. Changelog for mk-slave-restart: 2008-01-05: version 1.0.5 * Made suffixes for time options optional (bug #1858696). * Added logic to discard corrupt relay logs. * Added --monitor, --sentinel, and --stop. * Added --quiet and changed --verbose to 1 by default. * Added the ability to monitor many servers with --recurse. Changelog for mk-table-checksum: 2008-01-05: version 1.1.24 * Added support for the FNV_64 UDF, which is distributed with Maatkit. * --emptyrepltbl didn't Do The Right Thing by default. * --explain didn't disable --emptyrepltbl * Made suffixes for time options optional (bug #1858696). * The --float-precision option was ignored. * (mk-checksum-filter) -i, -d options worked only on multiple files. Changelog for mk-table-sync: 2008-01-05: version 1.0.3 * Added the --function command-line option. * Added support for the FNV_64 hash function (see mk-table-checksum). * Made suffixes for time options optional (bug #1858696). * InnoDB tables use --transaction unless it's explicitly specified.Technorati Tags:backups, mysql, replication, sql, synchronization, UDF
You might also like:
My posts lately have been mostly progress reports and release notices. That’s because we’re in the home stretch on the book, and I don’t have much spare time. However, a lot has also been changing with Maatkit, and I wanted to take some time to write about it properly. I’ll just write about each tool in no particular order.
Overall
I’ve been fixing a fair number of bugs, most of which have been in the code for a while. Every bug I fix these days gets a test case to guard against regressions. I’ve integrated the tests into the Makefile, so there’s no way for me to forget to run them.
The test suite has hundreds of tests, which is probably pretty good in comparison to many projects of this type. However, there will probably never be enough tests. I’ve moved much (in some cases, almost all) of the code into modules, which are easy to test, but it’s always a little harder to test programs themselves, so some things aren’t tested. (For example, it’s tedious to set up a test case that requires many MySQL instances to be running in a multi-tier replication setup).
Still, I think the quality has increased a lot in the last 6 months or so, since I’ve been more disciplined about tests. That discipline, by the way, was forced on me. The mk-table-sync tool was completely unmanageable. I was able to rewrite that tool in December, almost entirely using modularized, tested code.
mk-heartbeat
Jeremy Cole and Six Apart originally contributed this tool. Since then I’ve added a lot more features, allowed a lot more control over how it works, and it even works on PostgreSQL now. As an example, I added features that make it easy to run every hour from a crontab. It daemonizes, runs in the background, and then quits automatically when the new instance starts. I use it in production to give me a reliable metric for how up-to-date a slave is. When I need to know absolutely “has this slave received this update,” Seconds_behind_master won’t do, for many reasons. Load balancing and lots of other things hinge on up-to-date slaves.
mk-parallel-dump
I think this tool is probably the fastest, smartest way to do backups in tab-delimited format. I’ve been fixing a lot of bugs in this one, mostly for non-tab-delimited dumps. It has turned out to be harder to write this code because it uses shell commands to call mysqldump. (The tab-delimited dumps are done entirely via SQL, which is why it’s so good at what it does).
mk-slave-restart
I’ve been having a lot of trouble with relay log corruption, so unfortunately this tool has become necessary to use regularly in production. As a result I made it quite a bit smarter. It can detect relay log corruption, and instead of the usual skip-one-and-continue, it issues a CHANGE MASTER TO, so the slave will discard and re-fetch its relay logs. I’ve also made it capable of monitoring many slaves at once. (It discovers slaves via either SHOW SLAVE HOSTS or SHOW PROCESSLIST, so if you point it at a master, it can watch all the master’s slaves with a single command).
mk-table-checksum
I’ve made a lot of changes to this tool recently. Smarter chunking code to divide your tables into bits that are easier for the server to work with, TONS of small improvements and fixes, and much friendlier behavior.
The most recent release also includes a big speed improvement. Most of the time this tool spends is waiting for MySQL to run checksum queries. While my pure-SQL checksum queries are faster than most (all?) other ways to compare data in different servers, I’ve recently been trying to reduce the amount of work they cause.
As a result, I investigated Google’s MySQL patches. Mark Callaghan mentioned to me that he’d added a checksum function into their version of the server, and I wanted to look at that. They’re using the FNV hash function to checksum data. I decided that a UDF would be a fine way to write a faster row-checksum function, so I wrote a 64-bit FNV hash UDF. While I’m not the first person to do that, my version accepts any number of arguments, not just one. This makes it a lot more efficient to checksum every column in a row, because you don’t have to a) make multiple calls to the hash function or b) concatenate the arguments so you can make a single call. I also copied Google’s logic to make it simpler and more efficient to checksum NULLs, which avoids still more function calls. The UDF returns a 64-bit number, which can be fed directly to BIT_XOR to crush an entire table (or group of rows) into a single order-independent checksum. And finally, FNV is also a lot faster than, say, MD5 or SHA1.
The results are quite a bit faster for my hardware: 12.7 seconds instead of 80 seconds on a CPU-bound workload. So that’s at least a 6.2x speedup. (80 seconds was the best I was able to achieve before. Some of the checksum techniques used up to 197 seconds on the same data).
The UDF is really simple to compile and install, does no memory allocations or other nasty things, and should be safe for you to use. The source is included with the latest Maatkit release. (Older Maatkit versions won’t be able to take full advantage of it, by the way, but they can still be sped up somewhat). However, I would really appreciate some review from more experienced coders. I’m no C++ wizard. In fact, my first attempts at writing this thing were so blockheaded and wrong, I was almost embarrassed. (Thanks are due to the fine people hanging out on #mysql-dev).
mk-table-sync
After my week-long coding marathon on this in December, I’ve needed to continue working on this. I’ve needed it quite a few times to solve problems with replication. (Did I mention relay log corruption?). It’s much faster and less buggy now, and as a bonus, the latest release can also take advantage of the FNV UDF I just mentioned.
I think I should explain the general evolution in this tool’s life. It started out as “how to find differences in data efficiently.” This was a period where I did a lot of deep thinking on exploiting the structures inherent in data. It then progressed to “how to sync data efficiently.” At this point I was able to outperform another data-syncing tool by a wide margin, even though it was a multi-threaded C++ program and mine was just a Perl script. I did that by writing efficient queries and moving very little data across the network.
The most recent incarnation has thrown performance out the window, at least as measured by those criteria. The aforementioned C++ program now outperforms mine by a wide margin on the same tests.
What changed?
Two things: I’m focusing on quality, and I’m focusing on syncing running servers correctly with minimal interruption.
Once I have good-quality, well-tested code, I’ll be able to speed it up. I know this because I’m currently doing some things I know are slower than they could be.
But much more importantly, I’ve changed the whole angle of the tool. I want to be able to synchronize a busy master and slave, without locking tables, automatically ensuring that the data stays consistent and there are no race conditions. I do this with a lot of special tricks, such as syncing tables in small bits, using SELECT FOR UPDATE to lock only the rows I’m syncing, and so on. And I’m actively working to make the tool Do The Right Thing without needing 99 command-line arguments. (I think the latest release does this very well).
Instead of “make the sync use as little network traffic as possible,” I’ve changed the criteria of good-ness to “do it right, do it once, and don’t get in the way.”
As a result, I can sync a table that gets a ton of updates — one of the “hottest” tables in my application — without interfering with my application. Online. Correctly. In one pass. Through replication. Show me another tool that can do that, and I’ll re-run my benchmarks :-)
This doesn’t mean I don’t care about performance. I do, and I’ll bring back the earlier “go easy on the network” sync algorithms at some point. They are very useful when you have a slow network, or your tables aren’t being updated and you just want to sync things fast. I’ll also be able to speed up the “don’t interfere with the application” algorithms.
One interesting thing I did was divide up the functionality so the tool can use many different sync algorithms. I created something like a storage-engine API, except it’s a sync API. It’s really easy to add in new sync algorithms now. All I have to do is write the code that algorithm needs. This is really only about 200-300 lines of code for the current algorithms.
Tools that don’t yet exist
What I haven’t told you about is a lot of unreleased code and new tools. There’s some good stuff in the works. Also stay tuned — a third party might be about to contribute another tool to Maatkit, which will also be a very neat addition.
Conclusion
As Dana Carvey says, “If I had more time… the programs we have in place are getting the job done, so let’s stay on course, a thousand points of light. Well, unfortunately, I guess my time is up.” Maatkit is getting better all the time, just wait and see.
Technorati Tags:backups, Jeremy Cole, Mark Callaghan, Michael Radwin, mysql, PostgreSQL, replication, Six Apart, sql, synchronization, UDF
You Were Saying?