Tag Archive for 'optimization'

More progress on High Performance MySQL, Second Edition

Whew! I just finished a marathon of revisions. It’s been a while since I posted about our progress, so here’s an update for the curious readers.

I just finished revising the last two major chapters that Peter Zaitsev hasn’t yet reviewed. Peter has been essentially going through the chapters like a very thorough technical reviewer. He makes corrections, points out where things aren’t clear or need examples, and adds more material.

By “finished revising,” I mean finished expanding the outline into a full chapter. We’re still working at the level of “this chapter is mostly there, but we might decide to revise it more.” We will most certainly do so in many cases. There are some chunks of material that I’ve marked TODO to put into other chapters, for example. We’re not at the level of a final draft with any chapter except the chapter on MySQL’s architecture, but we’re getting close with the others now.

Most of the chapters are in tech review now, and we’ve gotten a few of them back. The comments from the reviewers have been very helpful. We expanded the Replication chapter quite a bit after tech review. (And then Peter reviewed it and we expanded it even more). When the tech reviewers return comments on the other chapters, we’ll revise some more.

We’re up to 529 pages in OpenOffice.org now. At my calculated ratio of 1 page = 1.1 pages in print, that’s about 582 pages in print. And that’s not counting the Replication chapter, which doesn’t have all of its illustrations yet. I predicted we’d break 500 pages; we might get close to 600. These are very, very densely written, too. No offense to the first edition, but the tone is quite different; much less light-hearted banter, much more compressed information. Peter is a walking encyclopedia, and never seems to run out of details we really ought to include because they’re important (and they are).

We may, or may not, go to production in the next few weeks. Regardless, I think we’re still on track to have the book on shelves by the MySQL Conference & Expo in April. Look for me there. I’ll be easy to find: I’ll be the tall guy with a permanent silly grin. (You’d grin too if you finished writing a book that’s been this much work!)

I’ve posted rough outlines for many of the other chapters. The two Peter and I just finished working on are the Scaling/HA/Load-Balancing/Failover chapter, and the Application-Level Optimization chapter. The Scaling/HA chapter is pretty long and very involved, and goes into a lot of detail on scaling in particular, especially horizontal scaling via sharding. (We use “sharding” because it’s less confusing than calling it “partitioning,” which already means too many different things in databases).

The Application-Level Optimization chapter is a little shorter. It’s mostly about caching strategies, how to make a web server run well, and so on. These aren’t what the book focuses on directly, but you can either help or hurt the database server a lot with your application design. Our goal here is to help people avoid the common mistakes.

For the curious, here’s the current outline for these two chapters:

Scaling and High Availability
  Terminology
  Scaling MySQL
    Planning for Scalability
    Buying Time Before Scaling
    Scaling Up
    Scaling Out
      Functional Partitioning
      Data Sharding
      Choosing a Partitioning Key
        Multiple Partitioning Keys
      Querying Across Shards
      Allocating Data, Shards, and Nodes
        Arranging Shards on Nodes
      Fixed Allocation
      Dynamic Allocation
        Mixing Dynamic and Fixed Allocation
      Explicit Allocation
      Sidebar: Re-Balancing Shards
      Tools for Sharding
    Scaling Back
      Keeping Active Data Separate
    Scaling by Clustering
      Clustering
      Federation
  Load Balancing
    Connecting Directly
      Splitting Reads and Writes in Replication
      Changing Application Configuration
      Changing DNS Names
      Moving IP Addresses
    Introducing a Middleman
      MySQL Proxy
      Load Balancers
    Load Balancing Algorithms
      Adding and Removing Servers in the Pool
    Load Balancing with a Master and Multiple Slaves
  High Availability
    Planning for High Availability
    Adding Redundancy
      Shared-Storage Architectures
      Replicated-Disk Architectures
      Synchronous MySQL Replication
    Failover and Failback
      Promoting a Slave or Switching Roles
      Virtual IP Addresses or IP Takeover
      MySQL Master-Master Replication Manager
      Middleman Solutions
      Handling Failover in the Application

And here’s the outline for the Application-Level Optimization chapter:

Application-Level Optimization
  Application Performance Overview
    Find the Source of the Problem
    Look for Common Problems
  Web Server Issues
    Finding the Optimal Concurrency
  Caching
    Sidebar: Caching Doesn't Always Help
    Caching Below the Application
    Application-Level Caching
    Cache Control Policies
    Cache Object Hierarchies
    Pre-Generating Content
  Extending MySQL
  Alternatives to MySQL

The thing that makes me the happiest right now is that we’re clearly going to make it. For a while, there was just so much work left to do that it was impossible to estimate how much. (Ask my wife: I was wrong many times when she asked how long it would take me to finish a chapter). I also didn’t know how much revision would be necessary, which is very scary; revising takes about four times as long as writing a first draft, by my reckoning. At this point, the remaining work is much smaller, and much easier to estimate. And now I no longer flip-flop daily between “I think we can, I think we can” and “please don’t ask, because I don’t know and I want a vacation.”

Subversion shows me that Peter has the Security chapter locked right now. This one is not a huge one, and Arjen Lentz has already reviewed it as well, so I don’t expect it to be a huge amount of work to revise. After that, it’s minor chapters and appendices. (We might actually convert the chapters on Server Status and Tools into appendices, since they got cannibalized when we realized their material fit better elsewhere. They also don’t have a very chapter-ish feel; they feel more like appendices). We’ve added a few more appendices, including one on EXPLAIN and one on debugging server and storage-engine locking problems. These are all great reference material.

See you at the conference in April!

Technorati Tags:, , , , , , , , , , , , ,

You might also like:

  1. High Performance MySQL, Second Edition: Replication, Scaling and High Availability
  2. Progress report on High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL, Second Edition
  4. High Performance MySQL, Second Edition: Advanced SQL Functionality
  5. High Performance MySQL, Second Edition: Backup and Recovery

Progress report on High Performance MySQL, Second Edition

It’s been a while since I’ve written about progress on the book. I actually stopped working on it as much at the beginning of the month, because on October 31st I managed to finish a first draft of the last big chapter (Scaling and High Availability)! Now I’m back to full-time work at my employer, and I’m working on the book in the evenings and weekends only.

This doesn’t mean the book is close to being done, though. The editor is sending out some chapters for technical review, and there’s still a lot more writing and revising to be done.

Last weekend I revised the Security chapter from the first edition, which I think will be the only chapter that we’ll just revise and update, rather than completely rewriting (well, maybe the Architecture chapter could be considered a revision instead of a rewrite, but it’s a stretch; we changed it a lot). I removed a lot of the material that repeated the MySQL manual, and added a lot of information and best practices on grants, new privileges and objects in MySQL 5, common tasks, common mistakes, and so on. The chapter ended up being nearly as long, even though I stripped out all the code listings and so on from the first edition (in fact, I reduced the first edition’s material to a few paragraphs).

Beyond that, though, there are little details to finish out in many of the chapters. Examples that need to be finished, figures that need to be re-drawn, material that doesn’t quite fit and needs to be re-arranged or even moved to another chapter; it’s a lot of work. Peter Zaitsev has been reviewing some of the core chapters on query and schema optimization etc, and I’m revising them in response to his comments. That’s what I spent today doing.

I think the biggest chunks of work that remain are going to be making chapters 3, 4, 5, 6 and 7 (benchmarking, profiling, schema, indexing, query optimization, advanced features, and server tuning) flow together well. The challenge here is how to organize the vast amount of material so it reads well, without too many forward references, and still be useful as a reference work. The detail we’ve gone into is incredible. It makes it very hard to find the single best place to mention each little bit of wisdom, because all of this material is completely inter-related. It’s tough to flatten the graph of knowledge into a one-dimensional narrative.

It’s not just these chapters that have a lot of inter-related material, of course. It’s hard to talk about tuning the server settings (chapter 7) without bringing the OS and hardware (chapter 8) into it, and whenever you do this you also need to think about measuring and monitoring status information (chapter 14). Of course, you need to do that for benchmarking and profiling, too (chapter 3). I’m sure you see the dilemma!

The good news is, if we succeed in doing this well, you will find the book enormously useful. Stay tuned!

Technorati Tags:, , , , , , , , , , , , , , ,

You might also like:

  1. Organizing High Performance MySQL, 2nd Edition
  2. Progress on High Performance MySQL, Second Edition
  3. High Performance MySQL, Second Edition: Backup and Recovery
  4. High Performance MySQL, Second Edition: Advanced SQL Functionality
  5. More progress on High Performance MySQL, Second Edition

High Performance MySQL, Second Edition: Schema Optimization and Indexing

I’ve been trying to circle back and clean up things I left for later in several chapters of High Performance MySQL, second edition. This includes a lot of material in chapter 4, Schema Optimization and Indexing. At some point I’ll write more about the process of writing this book, and what we’ve done well and what we’ve learned to do better, but for right now I wanted to complete the picture of what material we have on schema, index, and query optimization. The last two chapters I’ve written about (Query Performance Optimization and Advanced MySQL Features) have generated lots of feed back along the lines of “don’t forget X!” to which I’m obliged to reply “It’s in a different chapter.”

The truth is, it’s difficult to separate these topics sensibly. I’d like to do it in the mythical “perfect” way that serializes into a nice narrative without cross-references, but even the perfectionist in me wilts under the glare of deadlines. As a result, I don’t know if it’s really possible for us to completely avoid cross-references. (I do know there’s room for improvement in how we’ve arranged the material, but I’ve spent a lot of the day today trying to de-dupe some topics we wrote about in two places, and I’m coming to appreciate that re-organizing is an extraordinary amount of work, especially in OpenOffice.org — but more on that later).

All this is a preface to the following sentence: schema, indexing, advanced features, and query optimization are intermingled to some extent in the three chapters, even though we tried to separate the topics sensibly. I haven’t yet taken some of the suggestions I got in comments on the last chapter I posted. Like I said, reorganizing is a lot of work :-)

Here’s the outline. I have the same kinds of questions as before: what are we forgetting, do you have any questions or topics you’d like us to cover, etc? Comments are welcome.

[Update: I forgot to mention the vital statistics. So far it’s about 55 pages printed.]

[Intro]
Choosing Optimal Data Types
  General Guidelines for Data Storage
    Smaller is Usually Better
    To NULL or not to NULL?
    Choose Identifiers Carefully
  How to Choose a Good Data Type
    Numeric Types
    BIT Strings
    String Types
      [sidebar: Generosity can be Unwise]
    BLOB and TEXT Types
      [sidebar: How to Avoid On-Disk Temporary Tables]
    Using ENUM Instead of a String Type
    Date and Time Types
      [sidebar: Watch out for automatic migration programs]
Indexing Basics
  Types of Indexes
    BTREE Indexes
      Types of Queries that can Use a BTREE Index
      Indexed Column Isolation
    Prefix Indexes
    HASH Indexes
    Rolling Your Own HASH Indexes
    RTREE Indexes
    FULLTEXT Indexes
    Clustered Indexes
    Covering Indexes
  Index Scans and Using Indexes for Sorting
  Packed (Prefix-Compressed) Indexes
  Redundant and Duplicate Indexes
  Indexes and Locking
  Indexing Strategies
  An Indexing Case Study
    Supporting Many Kinds of Filtering
    Avoiding Multiple Range Conditions
    Optimizing Sorts
  Index and Table Maintenance 
    Finding and Repairing Table Corruption
    Updating Index Statistics
    Reducing Index Fragmentation
Normalization and Denormalization
  Pros and Cons of a Normalized Schema 
  Pros and Cons of a Denormalized Schema
  A Mixture of Normalized and Denormalized
  Cache and Summary Tables
    [sidebar: The Principle of Faster SELECT and Slower UPDATE]
Notes on Storage Engines
  MyISAM
  Memory
  InnoDB

Here’s a snippet of “what it’s like to write this book” that I’ll throw out there. OpenOffice.org, at least the version I’m using, doesn’t like O’Reilly’s custom heading styles and won’t show me an outline view of the document. I’m copying and pasting into this blog post by scrolling from one heading to the next. This is always enlightening, because as you can see a lot of the material isn’t organized correctly in the hierarchy. Guess what, it’s my first look at the chapter’s real outline, too! This isn’t the outline we planned to have, but the chapter evolved because of making localized changes without any real way to zoom out and make sure the outline still made sense. So my two comments on this are a) OpenOffice.org hasn’t been the most helpful tool in some ways and b) these blog posts are, to some extent, airing the project’s dirty laundry (illogical outlining, difficult separation of material among chapters, etc). I’m not afraid of that; I think it’s healthy and will help the book be better as a result. I guess my experience with open source, combined with my employer’s open-books policy, has taught me to embrace transparency instead of fearing it. In the end this material will be organized and make a lot of sense, but that’s a process of evolution — not intelligent design.

As I said, at some point I’ll write more about the process of writing. It’s been educational, and most bloggers I know who’ve written a book don’t say much about it (they just pop their heads up every now and then to apologize for not blogging). Very briefly: if you dream of writing a book, do it. It helps that my boss and co-workers support me in this venture, but it’s worth it regardless.

Technorati Tags:, , , , ,

You might also like:

  1. Progress report on High Performance MySQL, Second Edition
  2. Organizing High Performance MySQL, 2nd Edition
  3. High Performance MySQL, Second Edition: Advanced SQL Functionality
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. High Performance MySQL, Second Edition: Query Performance Optimization

High Performance MySQL, Second Edition: Query Performance Optimization

Your comments on the Advanced MySQL Features chapter were great. A lot of the questions I got (in the comments and via email) about chapter 6 are really addressed in chapter 5, “Query Performance Optimization,” so I’m posting its outline too. I have the same questions: are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Any other comments or questions?

A bit of basic chapter information: It’s currently 45 pages and very nearly a first draft, I think. The outline might be subject to change if we decide to rearrange the material, which actually might help in some places. As I pulled out the headings for this article, I noticed that some of the sections are just huge, and might benefit from being divided up a little bit more. But that’s all going to come out in the wash; the focus right now is getting the material down in some order that makes sense, and making sure it’s clear, concise, and correct.

[Update: I got curious how the page count in OpenOffice.org will translate to the page count on paper, so I opened one of the chapters from the first edition and it looks like every 10 pages in OpenOffice.org will be about 11 pages on paper. So this chapter really ought to be about 50 pages on paper].

Here’s the chapter outline:

[Intro]
Slow Query Basics: Optimize Data Access
  Are You Asking the Database For Data You Don't Need?
  Is MySQL Examining Too Much Data?
    Rows Examined and Rows Returned
Ways to Restructure Queries
  Complex Queries Vs. Many Queries
  Chopping Up a Query
  Join Decomposition
Query Execution Basics
  Query States
  The Query Cache
  The Query Compiler and Optimizer
    The MySQL Query Optimizer
    Table and Index Statistics
    MySQL's Join Execution Strategy
    The Join Optimizer
  Returning Results to the Client
MySQL Query Optimizer Limitations
  Correlated Subqueries
  When a Correlated Subquery is Good
  UNION limitations
  Index Merge Optimizations
  Equality Propagation
  Parallel Execution
  Hash Joins
  Loose Index Scans
  MIN() and MAX()
  SELECT and UPDATE on the same table
Optimizing Specific Types of Queries
  Optimizing COUNT() Queries
    What COUNT() Does
    Myths About MyISAM
    Simple Optimizations
    More Complex Optimizations
  Optimizing JOIN Queries
  Optimizing Subqueries
  Optimizing GROUP BY and DISTINCT
    Optimizing GROUP BY WITH ROLLUP
  Optimizing ORDER BY
  Optimizing LIMIT and OFFSET
  Optimizing UNION
  Optimizing Range Queries
Query Optimizer Hints
User-Defined Variables
Technorati Tags:, , ,

You might also like:

  1. High Performance MySQL, Second Edition: Advanced SQL Functionality
  2. High Performance MySQL, Second Edition: Backup and Recovery
  3. High Performance MySQL, Second Edition: Schema Optimization and Indexing
  4. Grab your High Performance MySQL sample content
  5. Coming soon: High Performance MySQL, Second Edition

High Performance MySQL, Second Edition: Advanced SQL Functionality

Work continues apace on High Performance MySQL, Second Edition (the link leads to the chapter outline). I’m working now on Chapter 6, Advanced SQL Functionality, and thought I’d solicit input on it. Are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Feel free to leave feedback in the comments. The chapter is already significantly done, with 26 pages written, but the ink’s not on paper yet, so there’s still time to correct omissions!

I should note that there are separate chapters on architecture, schema and indexing design, application optimization, query optimization, etc. We’re trying to focus this chapter on “advanced features” and how to get high performance out of them. Also, the outline is still subject to change: there’s so much material that it’s hard to decide the best place to put something, and exactly what is an “advanced feature” might be open to interpretation, or we might feel something is better placed elsewhere. That said, please throw your ideas at us and we’ll worry about the details for you. Here’s the outline of this chapter, as of three minutes ago:

[Intro]
The MySQL Query Cache
  Operational Detail and Caveats
  Optimizations
  Alternatives
Prepared Statements and Multiple Query Execution
  Client-side Prepared Statements
  Client-side Prepared Statements
  Optimization of the Execution Plan
  Server-side Prepared Statements
  Prepared Statements in Stored Procedures
  Prepared Statement Caveats
Cursors
  Client-side Cursors
Stored Procedures
  Pros and Cons
  Recommendation
  Example
User Defined Functions
Events
Views
  Updatable Views
  Limitations of Updatable Views
  Security
  Performance
Triggers
Working with Multi-byte Character Sets
Full-Text Search
  Full-Text Search Queries
  Changes in MySQL 5.1
  Full-Text Trade-Offs and Workarounds
  Full-Text Tuning and Optimization
Foreign Key Constraints
Merge Tables and Partitioning
  Merge Tables
    Merge Table Performance Limitations
    Merge Table Strengths
  Partitioned Tables
    Optimizing Queries Against Partitioned Tables
    Maintaining Partitioned Tables
Distributed Transactions (XA)
Technorati Tags:, , ,

You might also like:

  1. High Performance MySQL, Second Edition: Query Performance Optimization
  2. High Performance MySQL, Second Edition: Backup and Recovery
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. High Performance MySQL, Second Edition: Schema Optimization and Indexing
  5. Grab your High Performance MySQL sample content

How to check and optimize MySQL tables in parallel

I wanted to point out something that might not be obvious from the name: MySQL Parallel Dump can be used as a generic wrapper to discover tables and databases, and fork off worker processes to do something to them in parallel. That “something” can easily be invoking mysqlcheck — or any other program. This makes it really easy for you to do multi-threaded whatever-you-need-to-do on MySQL tables. Here’s how:

mysql-parallel-dump [options] -- 'mysqlcheck --optimize %D %N'

There are several things going on here:

  1. You’re running mysql-parallel-dump with all the ordinary options. Some of them are really specific to dumping data, but not all that many — most of the options are about choosing which databases to include and exclude, and so on.
  2. You’re adding a double dash -- to make it stop processing any further options.
  3. The rest of the arguments are being treated as a system command, but…
  4. Not before interpolating the database and table name into them. The %D and %N are a little macro language. There are some other macros too — see the documentation.

The net effect is to loop through all the tables and run OPTIMIZE TABLE on them.

MySQL Parallel Dump takes responsibility for noticing the exit status of the system command, keeping track of times, and reporting it all when it’s done. And its functionality for working on sets of things is also generic. You could easily create a table of “optimization jobs” and point it at that table, perhaps using the --age option, and it would obediently do what the table’s contents specify:

mysql> select setname, db, tbl from test.opti_job;
+-----------+--------+------------+
| setname   | db     | tbl        |
+-----------+--------+------------+
| dvd_store | sakila | film       | 
| dvd_store | sakila | film_actor | 
| set1      | test   | t1         | 
| set1      | test   | t2         | 
+-----------+--------+------------+
$ mysql-parallel-dump --nolocktables --sets set1,dvd_store --settable test.opti_job -- 'mysqlcheck --optimize %D %N > /dev/null'
        set1:              2 tables,     2 chunks,     2 successes,  0 failures,  0.14 wall-clock time,  0.17 dump time
   dvd_store:              2 tables,     2 chunks,     2 successes,  0 failures,  0.51 wall-clock time,  0.85 dump time
Final result:  2 sets,     4 tables,     4 chunks,     4 successes,  0 failures,  0.65 wall-clock time,  1.02 dump time

Much of the code for any kind of parallel tool is generic. I put a little extra time into this tool to make that code reusable, not special-purpose.

Technorati Tags:, , , , ,

You might also like:

  1. Introducing MySQL Parallel Restore
  2. Introducing MySQL Parallel Dump
  3. MySQL Toolkit version 1254 released
  4. MySQL Toolkit version 946 released
  5. Maatkit version 1297 released