High Performance MySQL, Second Edition: Schema Optimization and Indexing

I’ve been trying to circle back and clean up things I left for later in several chapters of High Performance MySQL, second edition. This includes a lot of material in chapter 4, Schema Optimization and Indexing. At some point I’ll write more about the process of writing this book, and what we’ve done well and what we’ve learned to do better, but for right now I wanted to complete the picture of what material we have on schema, index, and query optimization. The last two chapters I’ve written about (Query Performance Optimization and Advanced MySQL Features) have generated lots of feed back along the lines of “don’t forget X!” to which I’m obliged to reply “It’s in a different chapter.”

The truth is, it’s difficult to separate these topics sensibly. I’d like to do it in the mythical “perfect” way that serializes into a nice narrative without cross-references, but even the perfectionist in me wilts under the glare of deadlines. As a result, I don’t know if it’s really possible for us to completely avoid cross-references. (I do know there’s room for improvement in how we’ve arranged the material, but I’ve spent a lot of the day today trying to de-dupe some topics we wrote about in two places, and I’m coming to appreciate that re-organizing is an extraordinary amount of work, especially in OpenOffice.org — but more on that later).

All this is a preface to the following sentence: schema, indexing, advanced features, and query optimization are intermingled to some extent in the three chapters, even though we tried to separate the topics sensibly. I haven’t yet taken some of the suggestions I got in comments on the last chapter I posted. Like I said, reorganizing is a lot of work :-)

Here’s the outline. I have the same kinds of questions as before: what are we forgetting, do you have any questions or topics you’d like us to cover, etc? Comments are welcome.

[Update: I forgot to mention the vital statistics. So far it’s about 55 pages printed.]

[Intro]
Choosing Optimal Data Types
  General Guidelines for Data Storage
    Smaller is Usually Better
    To NULL or not to NULL?
    Choose Identifiers Carefully
  How to Choose a Good Data Type
    Numeric Types
    BIT Strings
    String Types
      [sidebar: Generosity can be Unwise]
    BLOB and TEXT Types
      [sidebar: How to Avoid On-Disk Temporary Tables]
    Using ENUM Instead of a String Type
    Date and Time Types
      [sidebar: Watch out for automatic migration programs]
Indexing Basics
  Types of Indexes
    BTREE Indexes
      Types of Queries that can Use a BTREE Index
      Indexed Column Isolation
    Prefix Indexes
    HASH Indexes
    Rolling Your Own HASH Indexes
    RTREE Indexes
    FULLTEXT Indexes
    Clustered Indexes
    Covering Indexes
  Index Scans and Using Indexes for Sorting
  Packed (Prefix-Compressed) Indexes
  Redundant and Duplicate Indexes
  Indexes and Locking
  Indexing Strategies
  An Indexing Case Study
    Supporting Many Kinds of Filtering
    Avoiding Multiple Range Conditions
    Optimizing Sorts
  Index and Table Maintenance 
    Finding and Repairing Table Corruption
    Updating Index Statistics
    Reducing Index Fragmentation
Normalization and Denormalization
  Pros and Cons of a Normalized Schema 
  Pros and Cons of a Denormalized Schema
  A Mixture of Normalized and Denormalized
  Cache and Summary Tables
    [sidebar: The Principle of Faster SELECT and Slower UPDATE]
Notes on Storage Engines
  MyISAM
  Memory
  InnoDB

Here’s a snippet of “what it’s like to write this book” that I’ll throw out there. OpenOffice.org, at least the version I’m using, doesn’t like O’Reilly’s custom heading styles and won’t show me an outline view of the document. I’m copying and pasting into this blog post by scrolling from one heading to the next. This is always enlightening, because as you can see a lot of the material isn’t organized correctly in the hierarchy. Guess what, it’s my first look at the chapter’s real outline, too! This isn’t the outline we planned to have, but the chapter evolved because of making localized changes without any real way to zoom out and make sure the outline still made sense. So my two comments on this are a) OpenOffice.org hasn’t been the most helpful tool in some ways and b) these blog posts are, to some extent, airing the project’s dirty laundry (illogical outlining, difficult separation of material among chapters, etc). I’m not afraid of that; I think it’s healthy and will help the book be better as a result. I guess my experience with open source, combined with my employer’s open-books policy, has taught me to embrace transparency instead of fearing it. In the end this material will be organized and make a lot of sense, but that’s a process of evolution — not intelligent design.

As I said, at some point I’ll write more about the process of writing. It’s been educational, and most bloggers I know who’ve written a book don’t say much about it (they just pop their heads up every now and then to apologize for not blogging). Very briefly: if you dream of writing a book, do it. It helps that my boss and co-workers support me in this venture, but it’s worth it regardless.

Technorati Tags:, , , , ,

You might also like:

  1. Progress report on High Performance MySQL, Second Edition
  2. Organizing High Performance MySQL, 2nd Edition
  3. High Performance MySQL, Second Edition: Advanced SQL Functionality
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. High Performance MySQL, Second Edition: Query Performance Optimization

5 Responses to “High Performance MySQL, Second Edition: Schema Optimization and Indexing”


  1. 1 david

    How are you going to address the differences between storage engines, e.g. on index types?

    The current version of the book only adds notes to the end of chapters, which don’t really go into detail. I would like to read about difference as I’m reading through the chapter, so that I know right away that feature x cannot be used with innodb.

  2. 2 Xaprb

    Right, we are mentioning limitations and restrictions throughout as they apply. We have so much more detail in this edition — I’d estimate at least ten times as much information — it would be impossible to simply add detail to the first book and get a usable book.

    That’s actually one of the biggest challenges with the book. Figuring out how to make a brain-dump from four people into a readable narrative and usable technical reference is hard.

  3. 3 psteve

    Jay Pipes’ presentation at http://jpipes.com/presentations/performance-coding/performance-coding-soup-to-nuts.pdf had some very interesting things to say about vertical partitioning (and horizontal partitioning as well). Are you going to be discussing partitioning? Vertical partitioning seems relevant to this chapter.

  4. 4 Xaprb

    Partitioning is discussed in several chapters, yes. Thanks for asking about it.

  5. 5 David Rasch

    This might be a great place to cover the process of designing a schema including a good list of:
    1. what questions you intend to ask the schema
    2. what you know about the growth of the given tables
    (seems like this barebones template might fit well in the case study)

    It might also be worth mentioning the storage of IP addresses as a LONGINT as this seems to be an atypical encoding in practice.

    I love the comments about “smaller is better”, but please don’t forget the pros/cons of static and dynamic row lengths.

    Lastly, but not least, resist the urge to have an explicit primary key if there really isn’t one (not so helfpul in InnoDB, but it can save lots of space on MyISAM).

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the Sourceforge forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead. I'm writing a book and my time is extremely limited :-)