Archive for the ‘pbxt’ tag
Hindsight on a scalable replacement for InnoDB
A while ago I posted about a comment a Sun performance engineer made about a scalable replacement for InnoDB. At the time, I did not believe it referred to Falcon. In hindsight, it seems even clearer that the Sun performance experts were already working hard on InnoDB itself.
Sun’s engineers have shown that they can produce great results when they really take the problems seriously. And I’m sure that InnoDB’s performance has untapped potential we don’t see right now. However, it does not follow that their work on InnoDB is what was meant by a scalable replacement for InnoDB. Or does it?
General-purpose MVCC transactional storage engines with row-level locking, whatever their performance and scaling characteristics in edge cases, fall into a category together. A person assembling a MySQL server for general-purpose use might choose a different storage engine for various uses — MyISAM here, Memory there… and use “one of those transactional engines” for the bulk of the work. PBXT, InnoDB, Falcon — I don’t see a justification for running more than one of those side by side. The operational costs alone (backups, training the users, etc) would be too high. It is also not at all clear that MySQL itself is ready for multiple transactional storage engines working together (e.g. cross-engine transactions) in the real world.
So what’s left for Falcon? I think they are asking themselves the same question (brilliant gallows humor, by the way). I think Falcon’s ideas and techniques are very interesting, but a storage engine — especially one with such lofty goals — is always a show-me undertaking that will require years to mature and prove itself even after the code is “ready.” With or without the Oracle acquisition, this question has loomed for years: where’s the justification for Falcon politically, functionally, economically? A third party engine such as PBXT, with eyes on replication at the storage engine level and other add-on functionality, has always seemed more likely to really add value than a straight-up InnoDB replacement.
But from my point of view, the biggest win in the short term would still be to drive InnoDB development forward at a consistent and accelerating pace to meet the needs of users and the advances in hardware. Of course, that’s what XtraDB set out to do, and I think the XtraDB project has helped snap InnoDB out of their Percheron-like plod towards improvement. This is nothing but good; when it comes to competition among storage engines, no one should be resting on their laurels. I also see that Sun’s team has more good things in the works, which is great. I’d love for InnoDB to stop being a work horse and start being a quarter horse. We need it to be both scalable and high-performance.
MySQL Conference and Expo 2008, Day Three
Here’s a rundown of Thursday (day 3) of the MySQL Conference and Expo. This day’s sessions were much more interesting to me than Wednesday’s, and in fact I wanted to go to several of them in a single time slot a couple of times.
Inside the PBXT Storage Engine
This session was, as it sounds, a look at the internals of PBXT, a transactional storage engine for MySQL that has some interesting design techniques. I had been looking forward to this session for a while, and Paul McCullagh’s nice explanations with clear diagrams were a welcome aid to understanding how PBXT works. Unlike some of the other storage engines, PBXT is being developed in full daylight, with an emphasis on community involvement and input. (Indeed, I may be contributing to it myself, in order to make its monitoring and tuning capabilities second to none).
PBXT has not only a unique design, but a clear vision for differentiating itself from other transactional storage engines. It’s not trying to clone any particular engine; Paul and friends are planning to add some capabilities that will really set it apart from other engines, including high-availability features and blob streaming.
I left this session with a much better understanding of how PBXT balances various demands to satisfy all sorts of different workload characteristics, how it writes data, how it achieves transactional durability, and so on. I think these capabilities, and its performance, can really be assessed only in the real world (of course), but in principle it sounds good. I love knowing how things work!
There were about 30 people in the talk. I wish there had been more, because I think PBXT is going to be an important part of the open ecosystem going forward. However, I feel pretty confident people will take more notice if it starts to get used in the real world. Someone had a video camera there, so you might check out the video when it’s available. Paul’s explanations are really good.
Helping InnoDB Scale on Servers with Many CPU Cores and Disks
This session was Mark Callaghan’s chance to unveil the work he and others have been doing on InnoDB’s scalability issues, which mostly revolve around mutex contention. Mark’s team has completely solved the problems on their workload and benchmarks. In fact, after the changes, InnoDB exhibited significantly better performance even than MyISAM, which began to be limited by the single mutex that synchronizes access to its key cache. (Yes, in fact MyISAM has scalability problems too).
Google’s workload for MySQL, in case you’re wondering, is pretty traditional (i.e. not web-like; more like an “enterprise” application). Heavily I/O-bound, 24/7 critical systems, and so on.
Mark also wore several community t-shirts at various points in the talk, including one of my Maatkit t-shirts. Mark said Maatkit would be perfect if only it were written in Python (Google’s preferred scripting language). Alas, Mark, it’ll stay in Perl. But thanks for the nice compliment anyway.
The room was packed full.
Scaling Heavy Concurrent Writes In Real Time
Dathan Pattishall, formerly the lead architect at Flickr, explained his techniques for scaling Flickr’s write capacity. He talked about how he’d worked to reduce primary key sizes, queued writes for batching, separated different types of data into different types of tables, and more. Dathan has never been afraid to do what he thinks is a good idea, even if it flies in the face of “best practices,” so I was happy to finally hear him talk.
By the way, Dathan pointed out that distributed locking with memcached and add() isn’t a silver bullet. It works ok until memcached evicts your lock due to the LRU policy. He uses MySQL’s built-in GET_LOCK() function for locking.
Dathan’s blog is a good source of information about his sometimes unorthodox approaches to database design.
The Power of Lucene
This was the only one of Frank (Farhan) Mashraqi’s talks I got to attend. This was pretty technical: how Lucene works, how to configure and install it, how to index documents, how to execute searches. If you were wondering how much work and complexity it would be to install and use Lucene, this talk would have been good for you to attend; I’ve never used it myself, but I’m pretty sure Frank covered everything you need to know.
MySQL Conference and Expo 2008, Day One
Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:
New Subquery Optimizations in 6.0
By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.
The Lost Art Of the Self Join
This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.
He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.
EXPLAIN Demystified
This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.
Someone did an audio recording of the session, but I don’t recall who it was.
Investigating InnoDB Scalability Limits
This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.
Replication Tricks and Tips
Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)
BoF Sessions
I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.
Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:
Me: I like it. My initials are BS.
Paul: BS actually means British Standard, so it can’t be bad.
Me: Better than American Standard. That’s a toilet.
We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.


