Archive for the ‘Paul McCullagh’ tag
Here’s a rundown of Thursday (day 3) of the MySQL Conference and Expo. This day’s sessions were much more interesting to me than Wednesday’s, and in fact I wanted to go to several of them in a single time slot a couple of times.
Inside the PBXT Storage Engine
This session was, as it sounds, a look at the internals of PBXT, a transactional storage engine for MySQL that has some interesting design techniques. I had been looking forward to this session for a while, and Paul McCullagh’s nice explanations with clear diagrams were a welcome aid to understanding how PBXT works. Unlike some of the other storage engines, PBXT is being developed in full daylight, with an emphasis on community involvement and input. (Indeed, I may be contributing to it myself, in order to make its monitoring and tuning capabilities second to none).
PBXT has not only a unique design, but a clear vision for differentiating itself from other transactional storage engines. It’s not trying to clone any particular engine; Paul and friends are planning to add some capabilities that will really set it apart from other engines, including high-availability features and blob streaming.
I left this session with a much better understanding of how PBXT balances various demands to satisfy all sorts of different workload characteristics, how it writes data, how it achieves transactional durability, and so on. I think these capabilities, and its performance, can really be assessed only in the real world (of course), but in principle it sounds good. I love knowing how things work!
There were about 30 people in the talk. I wish there had been more, because I think PBXT is going to be an important part of the open ecosystem going forward. However, I feel pretty confident people will take more notice if it starts to get used in the real world. Someone had a video camera there, so you might check out the video when it’s available. Paul’s explanations are really good.
Helping InnoDB Scale on Servers with Many CPU Cores and Disks
This session was Mark Callaghan’s chance to unveil the work he and others have been doing on InnoDB’s scalability issues, which mostly revolve around mutex contention. Mark’s team has completely solved the problems on their workload and benchmarks. In fact, after the changes, InnoDB exhibited significantly better performance even than MyISAM, which began to be limited by the single mutex that synchronizes access to its key cache. (Yes, in fact MyISAM has scalability problems too).
Google’s workload for MySQL, in case you’re wondering, is pretty traditional (i.e. not web-like; more like an “enterprise” application). Heavily I/O-bound, 24/7 critical systems, and so on.
Mark also wore several community t-shirts at various points in the talk, including one of my Maatkit t-shirts. Mark said Maatkit would be perfect if only it were written in Python (Google’s preferred scripting language). Alas, Mark, it’ll stay in Perl. But thanks for the nice compliment anyway.
The room was packed full.
Scaling Heavy Concurrent Writes In Real Time
Dathan Pattishall, formerly the lead architect at Flickr, explained his techniques for scaling Flickr’s write capacity. He talked about how he’d worked to reduce primary key sizes, queued writes for batching, separated different types of data into different types of tables, and more. Dathan has never been afraid to do what he thinks is a good idea, even if it flies in the face of “best practices,” so I was happy to finally hear him talk.
By the way, Dathan pointed out that distributed locking with memcached and
add() isn’t a silver bullet. It works ok until memcached evicts your lock due to the LRU policy. He uses MySQL’s built-in
GET_LOCK() function for locking.
Dathan’s blog is a good source of information about his sometimes unorthodox approaches to database design.
The Power of Lucene
This was the only one of Frank (Farhan) Mashraqi’s talks I got to attend. This was pretty technical: how Lucene works, how to configure and install it, how to index documents, how to execute searches. If you were wondering how much work and complexity it would be to install and use Lucene, this talk would have been good for you to attend; I’ve never used it myself, but I’m pretty sure Frank covered everything you need to know.
Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:
New Subquery Optimizations in 6.0
By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.
The Lost Art Of the Self Join
This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.
He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.
This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.
Someone did an audio recording of the session, but I don’t recall who it was.
Investigating InnoDB Scalability Limits
This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.
Replication Tricks and Tips
Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)
I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.
Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:
Me: I like it. My initials are BS.
Paul: BS actually means British Standard, so it can’t be bad.
Me: Better than American Standard. That’s a toilet.
We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.
In the first two articles in this series, I discussed archiving basics, relationships and dependencies, and specific archiving techniques for online transaction processing (OLTP) database servers. This article covers how to move the data from the OLTP source to the archive destination, what the archive destination might look like, and how to un-archive data. If you can un-archive easily and reliably, a whole new world of possibilities opens up.
How to move the data
At some point you have to actually take the data from the source and put it into the archive. This is a three-step process:
- Find archivable rows
- Insert them into the archive
- Delete them from the source
I wrote an article on how to find archivable rows efficiently, so I won’t go into it more here. Inserting and deleting are usually straightforward, but there are subtleties and tricks that can lead to nifty solutions.
The most important question about actually moving the data is how to do it safely, with or without transactions. Even if the source and archive are on different servers, you can do distributed transactions, either in your application logic or with a two-phase commit supported by your database product. For most purposes, I’ve found it just as reliable and more efficient to handle the transaction in your application logic.
For many of the reasons mentioned in the second article in this series, I would recommend relaxing the consistency requirements between source and archive, so you can keep the archived data out of the source’s transaction. You can do this safely by performing the operations in the order I listed above: insert, delete, commit the insert, then commit the delete. If you are archiving to a file at the same time, you can also write to the file before the insert.
Your archive might also be non-transactional. If you’re using MySQL, you should think about using a faster non-transactional engine that stores the data more compactly, such as MyISAM, for the archived data.
Use replication to unload the OLTP server
One of the most effective ways to archive an OLTP server without impacting it too much is to do the hard work of finding and inserting on a slave server, then performing the delete on the master and letting it flow through replication. Here’s an example from a past employer: we replicated the order table to a “transfer” database on the data warehouse server. A job on the data warehouse server looked for orders that had completed and shipped, and thus could be archived. It copied these in a transaction to the long-term storage, then deleted on the OLTP server. This delete flowed through replication back to the data warehouse, and removed the rows from the transfer database.
The archive server
I’ve already mentioned some ways you might design the archive server, but there are a few other things I want to cover briefly. The first is what happens when you don’t use a different server at all, and just archive to adjacent tables on the OLTP server. This can be a fine way to do things. As long as the data isn’t used, it’s just taking up space on disk. However, it might make backups more difficult.
If you use a different server to hold the archived data, you should probably consider some kind of partitioning scheme. If your server doesn’t support partitioned tables natively, you might want to archive into a different table every so often, building views over all the tables to present them as a single logical table. There are some important advantages to this, especially if you eventually want to purge really old data. It is much easier to drop an entire table when it ages out than to delete rows from a huge table.
This gets into the topic of how to build a large data warehouse, so I’ll just leave it at this: if you forsee the archived data getting large, start planning for it now.
Unless you use distributed transactions or some clever way to guarantee atomicity, there’s a chance you’ll insert to your archive but fail to delete from the source. Now you have duplicate data. How will you handle this?
First, decide if you can tolerate the situation. (I told you we hadn’t seen the last of the consistency requirements!) I suggest you take it as a given, if at all possible. Design your archiving jobs so they can tolerate existing data in case they get terminated uncleanly or otherwise have errors. If they try to insert rows that exist, you should probably overwrite the existing rows with new ones, which might have changed on the OLTP server. Make sure you don’t lose data from this, one way or another.
If you are archiving summary tables, you might need to be careful. A row that’s built incrementally on the OLTP system might need to be re-aggregated, instead of replaced, if it already exists in the archive.
Duplicated data makes some queries hard to get consistent. For instance, a view that takes the union of archived and un-archived data will tell a lie if a row exists in both places. You need to factor this in when deciding how to do the archiving. Duplicates can also happen during un-archiving.
Why would you ever want to un-archive?
Here are some reasons you might benefit from being able to un-archive easily:
- You treat all the data as equally important, but some of it as more likely to be accessed
- You know there’s unused data but it’ll be inefficient to figure out which rows
- You can’t get an exact answer on whether rows are archivable
Think of it this way: the ability to un-archive lowers the risk of archiving the wrong data, and allows you to archive things you might otherwise be unable to touch. It takes away or mitigates the downside.
This goes back to my analogy of archiving as flushing from a cache. You probably don’t treat databases as a multi-tier cache, and that’s a good thing. If the data isn’t where you expect, your applications would need to look elsewhere and retrieve it. Unless you write a wrapper around your database access layer that handles it automatically, this is probably infeasible.
However, you can still use the concept of retrieving missing data under certain circumstances. Does the following sound workable?
- Make most applications tolerate missing data and just do what they can with the data they have
- Identify points of entry where incoming data is a signal to un-archive something
- Hook an un-archiving system into the points of entry
- Archive freely and let un-archiving bring back anything it needs to
Here’s a concrete example from the advertising system I mentioned previously. This system archives ads eagerly; if they don’t have any traffic in a while, it moves them away. There are limited points of entry for ad-related data: traffic data, and a record of a sale that is attributed to an ad. The systems that load this incoming data can simply check whether all referenced ads exist, and if not, attempt to un-archive any missing ones. This happens as part of the routine checks for bad incoming data. This approach is fairly new for us, and might have some kinks we haven’t yet discovered, but there is virtually no downside. The data isn’t gone, it’s just elsewhere. Now we can archive data we couldn’t before, because it was too hard to get a definite “yes, this can be archived” answer. (It’s often easy to get a “no,” but hard to get a “yes.”)
Un-archiving is non-trivial. In fact, depending on your schema, you might need to be more careful about consistency requirements than you are with your archiving strategy. However, if you’re archiving correctly, un-archives should be few and far between, so you can afford a more expensive process.
In many ways, your options for un-archiving strategies might be similar to archiving strategies. In the systems I’ve worked on, we take the depth-first-search, dependencies-first, all-one-chunk approach I think is too inefficient to use for archiving.
If your archive is non-transactional, be careful to commit the insert into the OLTP system before you delete from the archive. Otherwise your delete will be permanent, but your insert might be rolled back, and the data is lost.
You don’t have to un-archive
If you don’t want to build an un-archiving system, you can build your applications to look in the archive for data they need and can’t find in the OLTP system. If you do this seldom enough, it might work fine. One order-history system I know of does this to find orders that aren’t in the OLTP server.
To round out this series, here is a collection of notes and references I didn’t want to mix in along the way, but I think belong here somewhere.
First, if you’re using MySQL, I’ve written tools that can handle much of the work I’ve described here. The first is MySQL Archiver, which can find and move archivable data very efficiently. It is full-featured and should handle most needs as-is, but it’s also extensible, so you can plug your own code in to handle complex archivability checks, dependency-first archiving, and so forth. Another of my tools, MySQL Find, can monitor and create historical records of table sizes, so you can get a sense of which tables are largest and/or growing the fastest (this is a one-liner that can go into your daily crontab). If you are archiving from InnoDB tables, you might want to record deadlocks with MySQL Deadlock Logger, for obvious reasons.
A couple more notes on MySQL: the choice of storage engines makes MySQL especially attractive for single-server archiving solutions, in my opinion. It’s quite practical to run your intense OLTP workload on InnoDB tables (or another transactional storage engine, such as Paul McCullagh’s PBXT) and store the archive tables side-by-side in MyISAM or ARCHIVE tables. If you’re using MySQL 5.1, you also get partitioned tables; if you’re not that bleeding-edge, you might consider the strategy I suggested at the recent MySQL Conference and Expo: archive to a small InnoDB table for high concurrency, then regularly convert it to MyISAM and place it into a MERGE collection, with a view that unions the InnoDB and MERGE tables (Sheeri Critzer blogged about this also, though I’m not sure how many people are actually doing it).
I don’t really like triggers and foreign key magic, so I relegated this suggestion to here: you can use triggers and/or foreign key constraints with
ON UPDATE actions to help with archiving and purging. I don’t like these approaches because I think they’re hidden, dangerous code. In Microsoft SQL Server I usually used stored procedures to archive, but in MySQL these days I use MySQL Archiver (linked above).
MySQL’s Edwin DeSouza wrote me to bring my attention to some of Craig Mullins’ recent articles about archiving. Craig’s insight is valuable if you’re researching archiving.
I think that’s it for the miscellany.
This series has covered what I believe to be the full scope of archiving strategies, from requirements to specific techniques you can use to archive and un-archive data from your OLTP systems. As a reminder, the larger context here is to offer scaling back as an alternative to the usual scale-up-or-scale-out dichotomy. There are always more options than you think! Archiving can be difficult and complex, but the potential gains for some types of data can be large, especially compared to some of the more frequently-discussed scaling tactics. Like other solutions, it doesn’t work for all situations, but if you forsee a huge amount of data coming at you, you should consider archiving along with other scaling techniques.