Archive for the 'High Performance MySQL' Category

High Performance MySQL Second Edition Schedule

I just got the rest of the production schedule from the publisher, plus the PDF files for quality control, for our upcoming book. (Now I have to proofreeed the whole book!) This is the first time I’ve seen the entire production schedule. The book is supposed to go to the printer in the first week of June. I don’t know what the on-the-shelf date will be, but I think very shortly after that. The publisher has promised that it’ll physically be on sale at Velocity.

I also took a peek at the PDFs. Without the appendixes, the last page of Chapter 14 (Tools for High Performance) is page 604. The appendixes bring it to 660 pages. That’s real material, not including tables of contents and indexes. So my estimate (620) was not too far off.

660 pages is not bad, considering that the contract was for 384 pages.

Another note: the marketing materials for the book emphasize that it covers MySQL 5.1. While this is true, I want to point out that we took a real-life approach: we write about what we’ve seen in the real world, and 5.1 is not as widely deployed in the real world. However, the book’s real value, as far as version-specific content goes, is its tremendous depth and breadth in MySQL 4.1 and 5.0. These have been “out there” for a long time, and among the four of us we’ve seen about every conceivable scenario with it. So you’ll get a lot of insight about current, production-ready, widely-used versions. Let the other guys speculate — we just report the facts. It’s not like there’s any shortage of things to say about 5.0, right?

Technorati Tags:

You might also like:

  1. High Performance MySQL 2nd Edition is in production
  2. Coming soon: High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL, Second Edition
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. An alternative to canonical URIs

Come to beCamp 2008

I’m going to be at beCamp 2008, the followup to the first beCamp, which I sadly missed.

beCamp is a BarCamp un-conference. Tonight was about meeting, greeting, and throwing ideas at the wall to see which ones stick. Literally. We stuck pieces of paper on the wall with our ideas — things we can either talk about or want to hear about — and then scratched our votes on them to see which are popular.

I live and breathe MySQL for a decent part of the day, so I hesitated, but then stuck “MySQL Performance” on the wall. It got quite a few votes, so I assume will be giving a talk on MySQL performance basics at some point during the conference. (The exact schedule is probably being determined right now, in my absence, but I’m so tired right now that I’ll just take my chances on it not being at 8:00 AM tomorrow.) [edit: I just checked the website and there won’t be anything before 9:00, and the schedule is determined tomorrow. I did say I’m tired, right?]

See you there!

PS: if you want to meet some of my colleagues from my former employer, the Rimm-Kaufman Group, they’ll be there too, wearing the “We’re Hiring” t-shirts. They’re hiring, by the way.

Technorati Tags:, , , ,

You might also like:

  1. I have joined Percona
  2. Summary of beCamp 2008
  3. Remember to sign up for MySQL Conference and Expo!
  4. Going to PostgreSQL Conference East
  5. My presentations at the 2008 MySQL Conference and Expo

Pre-Order High Performance MySQL Second Edition

High Performance MySQL

If you’re waiting for High Performance MySQL Second Edition to hit the shelf, you’re not the only one. I am too! I can’t wait to actually hold it in my hands.

But you don’t have to wait idly. No, not at all! You can pre-order it and then you’ll get it as soon as possible. Plus your pre-order will help them figure out how much demand there is, so it doesn’t sell out and make you wait for your own copy.

Technorati Tags:No Tags

You might also like:

  1. L. L. Bean’s privacy policy
  2. High Performance MySQL 2nd Edition is in production

Spring 2008 issue of MySQL Magazine

Keith Murphy and his hard-working crew have released the spring 2008 issue of MySQL Magazine. Go take a look — it includes quite a few articles on various topics, even a mention of our upcoming book (High Performance MySQL, Second Edition).

Technorati Tags:, ,

You might also like:

  1. Get a free sample chapter of High Performance MySQL Second Edition
  2. High Performance MySQL 2nd Edition is in production
  3. High Performance MySQL Second Edition Schedule
  4. Coming soon: High Performance MySQL, Second Edition
  5. High Performance MySQL, Second Edition: Replication, Scaling and High Availability

Get a free sample chapter of High Performance MySQL Second Edition

If you’re at the MySQL Conference and Expo, you can get a free sample chapter of the upcoming High Performance MySQL Second Edition. Just go to the exhibition area. As you go through the doors, take an immediate left and look for the sample chapter on O’Reilly’s table. It’s a rough draft and contains typos and my incredibly crude drawings instead of those that will go into the final book, but it should serve to give you an idea of the book’s depth and scope. Kudos to Andy Oram, our editor, who was able to get these done for us on very short notice.

Technorati Tags:,

You might also like:

  1. Progress on High Performance MySQL, Second Edition
  2. Coming soon: High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. High Performance MySQL Second Edition Schedule
  5. Progress report on High Performance MySQL, Second Edition

A different angle on the MySQL Conference

There are quite a few business angles you might see only if you’re here at the conference, and you won’t get from blogs. For example, let’s take a look at the contents of the shoulder bags they hand out with your registration. (This is only a partial list.)

  • SnapLogic’s flyer gets it right: their system is compatible with “GNU Linux.” Hooray, a commercial company acknowledging the GNU operating system for what it is!
  • MySQL Enterprise’s flyer has three big bullet points: MySQL Load Balancer, MySQL Connection Manager, and MySQL Enterprise Monitor Query Analyzer. The first two look like they’re probably built on MySQL Proxy. The last has a visual explain plan feature, which according to an elevator conversation is not yet built. I’ll stop by their booth and see. As you may know, Maatkit has provided a tool (which is designed for integration into other tools) that shows a visual explain plan for a long time.
  • There’s an issue of Linux Journal, which does not get the GNU part right. And it has no articles about MySQL. Off-topic! Discarded!
  • Infobright’s flyer says they can load data nearly real-time. I don’t know how you read it, but to me that says “can’t quite keep up with how fast you generate data.” So… what good can it possibly be, right?
  • The conference bag itself has Zmanda’s logo on the side.
  • Webyog’s flyer has one side for SQLyog, and one for MONyog. Each side takes the sparse but visually appealing approach of shiny icons to present a feature list. My favorite is the “Find slow SQL” turtle.
  • JasperSoft’s flyer has soothing, professional blues and rich reds. It makes them look very trustworthy. (I’m not being snarky.) And they have lots of nice whitespace. It’s a little bit of a different look.
  • Kickfire’s marketing department is really on the ball. I’ve seen a large number of flyers and other materials from them (online and offline) and they just changed their name and created a new logo and look-and-feel a short time ago. How do they do it so fast?
  • O’Reilly has a bunch of half-sized flyers for their conferences. We should have asked them to throw in one about our upcoming book, the second edition of High Performance MySQL. Alas, opportunity lost. By the way, stop by the bookstore and grab a copy of the sample chapter.
  • Zmanda, not content with stamping the outside of the bag, has a half-flyer inside it too, plus a chance to win a Digital Rebel to lure you to their booth. If you’re doing backups the way a lot of people seem to, you might want to stop by their booth anyway…
  • There’s a CD for a free trial of WinSQL. But the CD case doesn’t say what the

Sorry. I have a short attention span.

Technorati Tags:, , , ,

You might also like:

  1. High Performance MySQL 2nd Edition is in production
  2. High Performance MySQL, Second Edition: Advanced SQL Functionality
  3. Progress on High Performance MySQL Backup and Recovery chapter

High Performance MySQL 2nd Edition is in production

Just a quick note to say we have reached the production stage of the book project. Production is the process of transforming our OpenOffice.org files into the final page layout using a professional typesetting program.

As you can probably guess, this is later than we would have wished. This also means we won’t have the book for sale at the upcoming MySQL Conference and Expo. We will have a display copy at the O’Reilly booth at the conference, and you will be able to pre-order the book at a discount at that booth. (Several details remain to be worked out — do not trust the Amazon.com information on the book, as it is a weird blend of the first and second editions).

The book is very, very good. You will not be disappointed. I can’t think of a credible way to explain how good this book is — it’s just very, very good. Better than anything else you’ve ever read on the subject. So good that you will not want to share, because you’ll want to have your own copy handy for frequent reference (I currently refer to the OpenOffice.org files several times a week myself, and I wrote them!). But I’ll let you see for yourself. Buy a copy for yourself, your boss, your coworkers, and your mom. And your cat.

Technorati Tags:No Tags

You might also like:

  1. A different angle on the MySQL Conference
  2. High Performance MySQL Second Edition Schedule
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. What are your favorite MySQL replication filtering rules?
  5. Get a free sample chapter of High Performance MySQL Second Edition

Henceforth, I dub thee GLAMP

I’ve decided to start replacing L with GL in acronyms where L supposedly stands for Linux.

I’m not a big user of acronyms, because I think they are exclusionist and they obscure, rather than revealing. (This wouldn’t matter if I wrote for people who already knew what I meant and agreed with me, but that’s a waste of time). However, LAMP is one that I’ve probably used a few times, without thinking that it is supposed to stand for Linux, Apache, MySQL, and PHP/Perl/Python. In fact, it doesn’t refer to Linux, it refers to GNU/Linux. Therefore, it should be GLAMP.

Why does this matter? I try not to say Linux, unless I’m referring to a kernel, because a kernel is not an operating system. I try to be pretty careful about saying GNU/Linux when I’m talking about an operating system. An exception is a recruiting event yesterday at the University of Virginia, where I compromised my principles because of the noise. Trying to explain myself at that decibel level was just beyond my willingness, so I said we use Linux. If the potential recruits hire on with us, they’ll get to hear me say GNU/Linux. And if they don’t, maybe they’ll attend Richard Stallman’s upcoming talk at the engineering school there on March 27th or 28th (sorry, it’s not listed online, so I can’t link to it).

And you’ll see GNU/Linux used conscientiously if you read the book I’m helping to write, too.

GNU matters. A lot. You may not think so, but if it ceased to exist, you’d find out. That applies equally even if you don’t think you are a Free Software user. You have no idea how much you rely on Free Software in your daily life. And the GNU project has been and continues to be a keystone in that arch of freedom.

Thanks to MySQL’s Brian Aker for snapping me out of my LAMP carelessness.

Technorati Tags:, , , , ,

You might also like:

  1. How to use Linux’s CONFIG_IKCONFIG_PROC feature
  2. Why I write Free Software
  3. Announcement: Xaprb scripts are re-licensed

How pre-fetching relay logs speeds up MySQL replication slaves

I dashed off a hasty post about speeding up replication slaves, and gave no references or explanation. That’s what happens when I write quickly! This post explains what the heck I was talking about.

I first heard Paul Tuckfield talk at the first MySQL Camp, in November 2006. He mentioned that he speeds up MySQL replication by “pre-fetching relay logs” on the slave. Actually, I think he used the term “pipelining” at that point. Next Spring, he mentioned the same thing in his keynote address at the 2007 MySQL Conference & Expo. You can download audio and video of his talk from that link. He mentions this approach pretty late in the talk, almost at the end. I’ve been meaning to try duplicating his idea since the first time I heard him talk.

The basic idea is to help overcome replication’s single-threadedness. Under the right conditions, the slave’s SQL thread can become I/O-bound, even though the slave server has lots of unused I/O capacity. As a result, it spends a lot of time just waiting for the disk to return some data, and becomes much slower than it has to be.

Paul’s solution to this problem is to read the statements from the relay log, just a little bit ahead of the SQL thread’s position, convert them into SELECT queries, and execute them on the slave. This causes MySQL to request some of the data from the disk in advance. Then when the slave’s SQL thread wants to update that data, it’s already in memory, and things can potentially go much faster.

How much faster is open to debate. I think Paul sees about 3-4 times faster, but please don’t quote me on that. Farhan Mashraqui also uses this hack and gets some speedup as well.

The problem is, it won’t automatically work for everyone. In theory, it can potentially help if the following are true:

  • Your data is much bigger than memory.
  • You use a storage engine with row-level locking, like InnoDB.
  • Your workload is mostly small (single-row is good), widely scattered, random UPDATE and DELETE statements. (INSERT is less likely to benefit, because the relevant indexes are likely to be “hot” already).
  • The slave’s SQL thread is I/O-bound, but the slave has lots of spare I/O capacity. In other words, lots of disk spindles.

My slaves don’t do this kind of work. They do a lot of big multi-table updates and summary queries. There is very little to gain from pre-fetching the indexes and data for these statements, because whatever big query the SQL thread is running is likely to just flush the pre-fetched pages out of memory again before they’re needed. I tried anyway, and sure enough, it didn’t work.

The other problem is, it’s hard to write a generically useful program to do this kind of pre-fetching. It’s not too hard to write something specific to your workload, as Farhan did. But getting it to work right in general requires a lot of smarts, such as figuring out how far ahead of the slave SQL thread to stay, which queries not to try to pre-execute, and so on. I wrote an implementation of it that’s generic and has some intelligence built in. If you’re interested in it, see my previous post (linked at the top of this post).

If you’re thinking about writing something like this yourself, be prepared: it could be a lot of work. I can see how it would be simpler on some workloads, but on mine it was far from simple. I did some silly things, like running out of disk space because of temp files for LOAD DATA INFILE statements. Fortunately, that was just one of my benchmark machines.

If conditions aren’t right, it could really screw you. For example, if your slave has only a single disk, or if you use MyISAM on the slave, you’ll probably just cause problems for yourself. You need to know your hardware and your workload really well. That’s why Paul didn’t release his code, and I’ve hesitated for the same reason.

There’s more information about this in the upcoming High Performance MySQL, 2nd Edition, which I’m helping to write. We also have a lot more information about how to understand your hardware and workload. There’s no way I can fit it all into this post, and I don’t want to try. Even if I weren’t working like a mad dog on the book and had time to put it here, I can’t give away all the book’s goodies, can I? :-)

Technorati Tags:, , , ,

You might also like:

  1. Speed up your MySQL replication slaves
  2. Maatkit version 1709 released
  3. Introducing MySQL Slave Delay
  4. How MySQL replication got out of sync
  5. What are your favorite MySQL replication filtering rules?

More progress on High Performance MySQL, Second Edition

Whew! I just finished a marathon of revisions. It’s been a while since I posted about our progress, so here’s an update for the curious readers.

I just finished revising the last two major chapters that Peter Zaitsev hasn’t yet reviewed. Peter has been essentially going through the chapters like a very thorough technical reviewer. He makes corrections, points out where things aren’t clear or need examples, and adds more material.

By “finished revising,” I mean finished expanding the outline into a full chapter. We’re still working at the level of “this chapter is mostly there, but we might decide to revise it more.” We will most certainly do so in many cases. There are some chunks of material that I’ve marked TODO to put into other chapters, for example. We’re not at the level of a final draft with any chapter except the chapter on MySQL’s architecture, but we’re getting close with the others now.

Most of the chapters are in tech review now, and we’ve gotten a few of them back. The comments from the reviewers have been very helpful. We expanded the Replication chapter quite a bit after tech review. (And then Peter reviewed it and we expanded it even more). When the tech reviewers return comments on the other chapters, we’ll revise some more.

We’re up to 529 pages in OpenOffice.org now. At my calculated ratio of 1 page = 1.1 pages in print, that’s about 582 pages in print. And that’s not counting the Replication chapter, which doesn’t have all of its illustrations yet. I predicted we’d break 500 pages; we might get close to 600. These are very, very densely written, too. No offense to the first edition, but the tone is quite different; much less light-hearted banter, much more compressed information. Peter is a walking encyclopedia, and never seems to run out of details we really ought to include because they’re important (and they are).

We may, or may not, go to production in the next few weeks. Regardless, I think we’re still on track to have the book on shelves by the MySQL Conference & Expo in April. Look for me there. I’ll be easy to find: I’ll be the tall guy with a permanent silly grin. (You’d grin too if you finished writing a book that’s been this much work!)

I’ve posted rough outlines for many of the other chapters. The two Peter and I just finished working on are the Scaling/HA/Load-Balancing/Failover chapter, and the Application-Level Optimization chapter. The Scaling/HA chapter is pretty long and very involved, and goes into a lot of detail on scaling in particular, especially horizontal scaling via sharding. (We use “sharding” because it’s less confusing than calling it “partitioning,” which already means too many different things in databases).

The Application-Level Optimization chapter is a little shorter. It’s mostly about caching strategies, how to make a web server run well, and so on. These aren’t what the book focuses on directly, but you can either help or hurt the database server a lot with your application design. Our goal here is to help people avoid the common mistakes.

For the curious, here’s the current outline for these two chapters:

Scaling and High Availability
  Terminology
  Scaling MySQL
    Planning for Scalability
    Buying Time Before Scaling
    Scaling Up
    Scaling Out
      Functional Partitioning
      Data Sharding
      Choosing a Partitioning Key
        Multiple Partitioning Keys
      Querying Across Shards
      Allocating Data, Shards, and Nodes
        Arranging Shards on Nodes
      Fixed Allocation
      Dynamic Allocation
        Mixing Dynamic and Fixed Allocation
      Explicit Allocation
      Sidebar: Re-Balancing Shards
      Tools for Sharding
    Scaling Back
      Keeping Active Data Separate
    Scaling by Clustering
      Clustering
      Federation
  Load Balancing
    Connecting Directly
      Splitting Reads and Writes in Replication
      Changing Application Configuration
      Changing DNS Names
      Moving IP Addresses
    Introducing a Middleman
      MySQL Proxy
      Load Balancers
    Load Balancing Algorithms
      Adding and Removing Servers in the Pool
    Load Balancing with a Master and Multiple Slaves
  High Availability
    Planning for High Availability
    Adding Redundancy
      Shared-Storage Architectures
      Replicated-Disk Architectures
      Synchronous MySQL Replication
    Failover and Failback
      Promoting a Slave or Switching Roles
      Virtual IP Addresses or IP Takeover
      MySQL Master-Master Replication Manager
      Middleman Solutions
      Handling Failover in the Application

And here’s the outline for the Application-Level Optimization chapter:

Application-Level Optimization
  Application Performance Overview
    Find the Source of the Problem
    Look for Common Problems
  Web Server Issues
    Finding the Optimal Concurrency
  Caching
    Sidebar: Caching Doesn't Always Help
    Caching Below the Application
    Application-Level Caching
    Cache Control Policies
    Cache Object Hierarchies
    Pre-Generating Content
  Extending MySQL
  Alternatives to MySQL

The thing that makes me the happiest right now is that we’re clearly going to make it. For a while, there was just so much work left to do that it was impossible to estimate how much. (Ask my wife: I was wrong many times when she asked how long it would take me to finish a chapter). I also didn’t know how much revision would be necessary, which is very scary; revising takes about four times as long as writing a first draft, by my reckoning. At this point, the remaining work is much smaller, and much easier to estimate. And now I no longer flip-flop daily between “I think we can, I think we can” and “please don’t ask, because I don’t know and I want a vacation.”

Subversion shows me that Peter has the Security chapter locked right now. This one is not a huge one, and Arjen Lentz has already reviewed it as well, so I don’t expect it to be a huge amount of work to revise. After that, it’s minor chapters and appendices. (We might actually convert the chapters on Server Status and Tools into appendices, since they got cannibalized when we realized their material fit better elsewhere. They also don’t have a very chapter-ish feel; they feel more like appendices). We’ve added a few more appendices, including one on EXPLAIN and one on debugging server and storage-engine locking problems. These are all great reference material.

See you at the conference in April!

Technorati Tags:, , , , , , , , , , , , ,

You might also like:

  1. High Performance MySQL, Second Edition: Replication, Scaling and High Availability
  2. Progress report on High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL, Second Edition
  4. High Performance MySQL, Second Edition: Advanced SQL Functionality
  5. High Performance MySQL, Second Edition: Backup and Recovery