Xaprb

Stay curious!

Archive for the ‘load-balancing’ tag

More progress on High Performance MySQL, Second Edition

with 6 comments

Whew! I just finished a marathon of revisions. It’s been a while since I posted about our progress, so here’s an update for the curious readers.

I just finished revising the last two major chapters that Peter Zaitsev hasn’t yet reviewed. Peter has been essentially going through the chapters like a very thorough technical reviewer. He makes corrections, points out where things aren’t clear or need examples, and adds more material.

By “finished revising,” I mean finished expanding the outline into a full chapter. We’re still working at the level of “this chapter is mostly there, but we might decide to revise it more.” We will most certainly do so in many cases. There are some chunks of material that I’ve marked TODO to put into other chapters, for example. We’re not at the level of a final draft with any chapter except the chapter on MySQL’s architecture, but we’re getting close with the others now.

Most of the chapters are in tech review now, and we’ve gotten a few of them back. The comments from the reviewers have been very helpful. We expanded the Replication chapter quite a bit after tech review. (And then Peter reviewed it and we expanded it even more). When the tech reviewers return comments on the other chapters, we’ll revise some more.

We’re up to 529 pages in OpenOffice.org now. At my calculated ratio of 1 page = 1.1 pages in print, that’s about 582 pages in print. And that’s not counting the Replication chapter, which doesn’t have all of its illustrations yet. I predicted we’d break 500 pages; we might get close to 600. These are very, very densely written, too. No offense to the first edition, but the tone is quite different; much less light-hearted banter, much more compressed information. Peter is a walking encyclopedia, and never seems to run out of details we really ought to include because they’re important (and they are).

We may, or may not, go to production in the next few weeks. Regardless, I think we’re still on track to have the book on shelves by the MySQL Conference & Expo in April. Look for me there. I’ll be easy to find: I’ll be the tall guy with a permanent silly grin. (You’d grin too if you finished writing a book that’s been this much work!)

I’ve posted rough outlines for many of the other chapters. The two Peter and I just finished working on are the Scaling/HA/Load-Balancing/Failover chapter, and the Application-Level Optimization chapter. The Scaling/HA chapter is pretty long and very involved, and goes into a lot of detail on scaling in particular, especially horizontal scaling via sharding. (We use “sharding” because it’s less confusing than calling it “partitioning,” which already means too many different things in databases).

The Application-Level Optimization chapter is a little shorter. It’s mostly about caching strategies, how to make a web server run well, and so on. These aren’t what the book focuses on directly, but you can either help or hurt the database server a lot with your application design. Our goal here is to help people avoid the common mistakes.

For the curious, here’s the current outline for these two chapters:

Scaling and High Availability
  Terminology
  Scaling MySQL
    Planning for Scalability
    Buying Time Before Scaling
    Scaling Up
    Scaling Out
      Functional Partitioning
      Data Sharding
      Choosing a Partitioning Key
        Multiple Partitioning Keys
      Querying Across Shards
      Allocating Data, Shards, and Nodes
        Arranging Shards on Nodes
      Fixed Allocation
      Dynamic Allocation
        Mixing Dynamic and Fixed Allocation
      Explicit Allocation
      Sidebar: Re-Balancing Shards
      Tools for Sharding
    Scaling Back
      Keeping Active Data Separate
    Scaling by Clustering
      Clustering
      Federation
  Load Balancing
    Connecting Directly
      Splitting Reads and Writes in Replication
      Changing Application Configuration
      Changing DNS Names
      Moving IP Addresses
    Introducing a Middleman
      MySQL Proxy
      Load Balancers
    Load Balancing Algorithms
      Adding and Removing Servers in the Pool
    Load Balancing with a Master and Multiple Slaves
  High Availability
    Planning for High Availability
    Adding Redundancy
      Shared-Storage Architectures
      Replicated-Disk Architectures
      Synchronous MySQL Replication
    Failover and Failback
      Promoting a Slave or Switching Roles
      Virtual IP Addresses or IP Takeover
      MySQL Master-Master Replication Manager
      Middleman Solutions
      Handling Failover in the Application

And here’s the outline for the Application-Level Optimization chapter:

Application-Level Optimization
  Application Performance Overview
    Find the Source of the Problem
    Look for Common Problems
  Web Server Issues
    Finding the Optimal Concurrency
  Caching
    Sidebar: Caching Doesn't Always Help
    Caching Below the Application
    Application-Level Caching
    Cache Control Policies
    Cache Object Hierarchies
    Pre-Generating Content
  Extending MySQL
  Alternatives to MySQL

The thing that makes me the happiest right now is that we’re clearly going to make it. For a while, there was just so much work left to do that it was impossible to estimate how much. (Ask my wife: I was wrong many times when she asked how long it would take me to finish a chapter). I also didn’t know how much revision would be necessary, which is very scary; revising takes about four times as long as writing a first draft, by my reckoning. At this point, the remaining work is much smaller, and much easier to estimate. And now I no longer flip-flop daily between “I think we can, I think we can” and “please don’t ask, because I don’t know and I want a vacation.”

Subversion shows me that Peter has the Security chapter locked right now. This one is not a huge one, and Arjen Lentz has already reviewed it as well, so I don’t expect it to be a huge amount of work to revise. After that, it’s minor chapters and appendices. (We might actually convert the chapters on Server Status and Tools into appendices, since they got cannibalized when we realized their material fit better elsewhere. They also don’t have a very chapter-ish feel; they feel more like appendices). We’ve added a few more appendices, including one on EXPLAIN and one on debugging server and storage-engine locking problems. These are all great reference material.

See you at the conference in April!

High Performance MySQL, Second Edition: Replication, Scaling and High Availability

with 5 comments

Continuing in the tradition, which I hope has been as helpful to you as it has been to me, I’m opening the floor for suggestions on chapter 9 of the upcoming High Performance MySQL, Second Edition. Unlike the other chapters for which I’ve listed outlines, this one isn’t substantially written yet. It’s in detailed outline form at this point (a tactic that has worked very well for us so far — I’ll write about that someday).

I’m trying to get feedback much earlier in this chapter’s lifecycle, for several reasons. Two of the most important are that this is one of the first chapters I’ve had a chance to really take from scratch, and the chapters I haven’t written from scratch have been harder to organize, as you’ve probably seen from the last few outlines I posted. There’s a lot of value in working top-down on this deep encyclopedia-style material.

The outline, as it stands now, is basically headings with bulleted lists of important details. Here are the top-level headings:

[Intro]
Scaling and High Availability Requirements
Replication Overview
Configuring Replication
Under the Hood of Replication
Replication Topologies
Replication Administration and Maintenance
Replication Problems and Solutions
The Future of MySQL Replication
Scaling MySQL Horizontally
Clustering with MySQL
   MySQL Cluster
   Other Clustering Solutions
Load Balancing

Just a few notes. These sections are top-level, and will likely be split into many sub-sections like other chapter outlines I’ve posted. A typical section has a couple dozen bullet-points in it, at a high level of granularity, such as “Using DRBD for log replication only.” I think we’ll also add in a separate section on fail-over and fail-back, but that’s not in the outline as of right now (what do you think belongs in it?).

I don’t know what it’s like for you to read outlines and see little bits of the book being assembled, but the process of writing this book is just fascinating to me. It’s endlessly interesting and educational — just the process of writing, let alone the subject matter! This is a really fun project. A heck of a lot of work, but fun nonetheless, and the openness of the project makes it even more fun for me. I’ve learned a lot of surprising and interesting things about writing. I keep wishing I had time to write about this process, but I really need to keep my eye on the deadlines and put that off for later.

Anyway, the usual requests apply: what’s missing, what do you think is cool and should be included, etc etc? Thanks, as usual, for your time and feedback.

Written by Xaprb

October 18th, 2007 at 9:59 pm