Archive for the ‘SQL’ Category
A review of MySQL Administrator’s Bible
MySQL Administrator’s Bible by Sheeri K. Cabral and Keith Murphy, 2009. Page count: 800+ pages. (Here’s a link to the publisher’s site.)
This book is a comprehensive reference guide to MySQL that’s accessible to beginning DBAs or DBAs familiar with another database. It has enough detail to be a useful companion throughout a DBA’s career. It also covers many related technologies, such as memcached, at a moderate-but-useful level of detail. This isn’t exactly a how-to book, and it isn’t exactly a reference manual; it’s more of a blend of the two.
The audience will depend on personal preferences. Some of the reference material is the type of thing I would look up with command-line --help options or the MySQL manual. But there are times when the reference aspect of the book is uniquely valuable. For example, the online documentation tends to list things alphabetically; the book might break them down into groups by function. An example is the sql_mode parameters, which it groups into categories like “Getting rid of silent failures, silent conversions, and silently allowing invalid data.”
The non-reference aspect of the book has a lot of examples of how to do things, such as how to set up replication over SSL. This is exactly what I’d look for in a book. Otherwise, you’re reduced to reading documentation (inefficient, mind-numbing) or trusting the information you find online, which is generally not something I do.
Speaking of trusting information, I was happy to see very few typos or errors. Occasionally I caught a minor slip. For example, when discussing the limited memory a 32-bit mysqld can use because it runs in a single process, there’s a typo that mis-states this architectural feature as “mysqld is currently single-threaded,” which is not quite the same thing. Overall, you can rely on the information you’ll read in this book.
The book is divided into four parts: first steps, developing with MySQL, core administration, and a set of chapters and appendixes grouped under extending your skills. I think this organization works well. You can read the full Table Of Contents at the publisher’s site linked above.
Coverage is for MySQL 5.1 and 6.0. As we know, 5.1 was GA’ed and 6.0 has been killed and replaced by a new release policy. This gives a slightly odd feeling to some passages, which speak about 5.1 in the past tense and 6.0 in the present tense! As far as I know, however, this book contains the most complete coverage of MySQL 5.1 in print. The only other similar book I’ve read that covers 5.1 is High Performance MySQL 2nd Edition — and that one is a bit light on details because there wasn’t a lot of production knowledge of 5.1 yet (I’m the lead author of HPM2e, by the way).
Speaking of which, I think that MySQL Administrator’s Bible is a very good complement to High Performance MySQL, 2nd Edition. The former is useful to people who have varying levels of knowledge, while the latter assumes a lot of experience and doesn’t cover introductory material much. And the books have different topics, of course. So if you’re new to databases, or if you’re new to MySQL, you might do well to start with Sheeri’s book, then continue or supplement your education with ours.
All in all, this is a book that’s well worth buying if you’re going to administer a serious MySQL installation. I tip my hat to Sheeri — I don’t know how she did it. It’s a huge project and she pulled through the last (and by far most difficult) part of it by herself.
Disclosure: I tried to be a tech reviewer for this book, but I was over-committed and had to back out after a while.
Seeking input on a badness score for query execution
Suppose that you’re writing a new Maatkit tool (just a random example, really) and its job is to measure the difference in execution of queries. The simplest metric is execution time.
Now suppose that you’re trying to figure out a metric of badness. The query executes in a second on machine 1 and 1000 seconds on machine 2. That’s a pretty bad change. How do you quantify this?
Now you’ve got a query that executes in 1ms on machine 1, and 10ms on machine 2. It’s a tenfold change. Is it a bad change? Maybe it’s just the difference in which files were cached in memory, or network latency because someone flooded the TCP pipe and the packets had to be backed off and retried, or something like that. Is this significant? How should it contribute to the badness score?
Let’s think of another example too. Later in this mythical tool’s life, we’ll be examining EXPLAIN and looking at row estimates. There are important differences between estimates of 1 row, 2 rows, 20 rows and 2000 rows. But from 2 to 3 rows, or from 100 to 125 rows — is that a significant change? How should it contribute to the badness score? What about this: a full table scan vs. an index scan, how should that contribute?
The general problem that I’m gesturing at here is a kind of generic badness score, which can be an accumulation of lots (dozens) of factors. From my thinking on it so far, it’s a very complex problem, because you want to avoid false positives, you want to really capture a badness score in a way that’s quantifiable and sortable (this query is badder than that one, all things considered) and you don’t want to miss small things in the noise (these queries are the same in 23 of the 24 metrics, and that 24th metric is enough to trigger the alarm).
The other thing that’s worked its way into my small brain is this: it’s got to be a solved problem (unless it’s really intractable). But I can’t think of the right combination of things to point me to the right Computer Science literature.
Help?
Recap of Southeast Linux Fest 2009
Last weekend, my brother and I attended SELF 2009. A few thoughts on it:
The mixture of sessions was interesting. There were some really good ones. I think the best session I attended was an OpenSolaris/NetBeans/Glassfish/Virtualbox/ZFS session, given by a Sun employee. He was an excellent presenter, and really showed off the strengths of the technologies in a nice way. He started up enough VMs to make his OpenSolaris laptop chew into swap, and I thought it was fun to see how it dealt with that. I’ve heard Solaris and OpenSolaris do a lot better at avoiding and managing swapping than GNU/Linux, but I couldn’t make any opinion from watching. I did think it was odd to have this session at a “Linux” (yes, they left off the GNU) conference. But I thought the session was a good addition to the conference. In other sessions, and in the hallways and expo, there was a lot more slant towards open-source software and gadgetry in general than there was towards GNU/Linux. The sessions that were about Linux or GNU/Linux were top-heavy towards topics like educational initiatives.
The Free Software Foundation had a booth in the expo hall. It was funny that they didn’t boycott the event, because I know RMS won’t speak at so-called “Linux User Groups” and insists they be called “GNU/Linux User Groups.” I guess the FSF is not unified behind that banner. Regardless, I used the opportunity to renew my membership perpetually. I’m so lazy that I need something like this to stay involved!
The expo hall was dominated by Red Hat, Fedora, and SUSE; PostgreSQL was there, but not MySQL. There was a good variety and number of vendors. It was great to see the healthy support of the event, which was free, by the way.
Clemson, SC is not easy to get to, and while the Clemson campus was attractive and functioned fine, it’s nothing you can’t find elsewhere. I ended up driving over 9 hours to get to it. I’d have preferred the technology triangle, which if nothing else is close to major airports, bus and train stops, and Red Hat.
Richard Hipp talked about the great fsync() bug, a similar talk to the one he gave at the first OpenSQL Camp. Someone asked about Tokyo Cabinet and he responded that he hasn’t found any fsync() calls in its source code. *cough* Something worth thinking about for on-disk usage (I haven’t looked at its source much myself). TC can also be used in-memory-only, and a while back I suggested that usage of it for Drizzle to replace the Memory engine; I don’t know what became of that.

