Xaprb

Stay curious!

High Performance MySQL, Second Edition: Query Performance Optimization

with 6 comments

Your comments on the Advanced MySQL Features chapter were great. A lot of the questions I got (in the comments and via email) about chapter 6 are really addressed in chapter 5, “Query Performance Optimization,” so I’m posting its outline too. I have the same questions: are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Any other comments or questions?

A bit of basic chapter information: It’s currently 45 pages and very nearly a first draft, I think. The outline might be subject to change if we decide to rearrange the material, which actually might help in some places. As I pulled out the headings for this article, I noticed that some of the sections are just huge, and might benefit from being divided up a little bit more. But that’s all going to come out in the wash; the focus right now is getting the material down in some order that makes sense, and making sure it’s clear, concise, and correct.

[Update: I got curious how the page count in OpenOffice.org will translate to the page count on paper, so I opened one of the chapters from the first edition and it looks like every 10 pages in OpenOffice.org will be about 11 pages on paper. So this chapter really ought to be about 50 pages on paper].

Here’s the chapter outline:

[Intro]
Slow Query Basics: Optimize Data Access
  Are You Asking the Database For Data You Don't Need?
  Is MySQL Examining Too Much Data?
    Rows Examined and Rows Returned
Ways to Restructure Queries
  Complex Queries Vs. Many Queries
  Chopping Up a Query
  Join Decomposition
Query Execution Basics
  Query States
  The Query Cache
  The Query Compiler and Optimizer
    The MySQL Query Optimizer
    Table and Index Statistics
    MySQL's Join Execution Strategy
    The Join Optimizer
  Returning Results to the Client
MySQL Query Optimizer Limitations
  Correlated Subqueries
  When a Correlated Subquery is Good
  UNION limitations
  Index Merge Optimizations
  Equality Propagation
  Parallel Execution
  Hash Joins
  Loose Index Scans
  MIN() and MAX()
  SELECT and UPDATE on the same table
Optimizing Specific Types of Queries
  Optimizing COUNT() Queries
    What COUNT() Does
    Myths About MyISAM
    Simple Optimizations
    More Complex Optimizations
  Optimizing JOIN Queries
  Optimizing Subqueries
  Optimizing GROUP BY and DISTINCT
    Optimizing GROUP BY WITH ROLLUP
  Optimizing ORDER BY
  Optimizing LIMIT and OFFSET
  Optimizing UNION
  Optimizing Range Queries
Query Optimizer Hints
User-Defined Variables

Further Reading:

Written by Xaprb

October 7th, 2007 at 4:05 pm

Posted in Uncategorized

Tagged with , , ,

6 Responses to 'High Performance MySQL, Second Edition: Query Performance Optimization'

Subscribe to comments with RSS

  1. Based on the outline I think this chapter is actually two chapters combined, one to discuss the optimizer and the theory behind it, and the second one is the optimization discussion. It’s the same as cookbooks – the “theory” and cooking how to is different section/chapter from the list of recipes.

    What I’d like to see is different optimization rules based on the volume of data, from my experience (mostly Oracle) one should take different approach for optimizing based on the actual volume.

    Ran H

    7 Oct 07 at 6:02 pm

  2. That’s a useful observation. You may have put your finger on a back-of-my-mind feeling about the chapter.

    There’s a lot of stuff in chapter 7, Optimizing Server Settings, which talks about volume of data. I think in general we assume big datasets and/or heavy load. I will run this by the other authors and see what they say — frankly they have more experience with large-volume installations than I do (I manage data in the 10s to 100s of GB regularly, but not TB).

    Xaprb

    7 Oct 07 at 8:33 pm

  3. Hmm, I presume stuff like slow query log, explain etc. are discussed in another chapter because I do not see where they would fit in the outline? I have done a number of talks on generic (though with a MySQL slant) talk on this topic. You might find my slides useful:
    http://pooteeweet.org/slides

    Although the book is a bit dated and alot of the comments about MySQL are no longer valid, I still recommend “SQL Performance Tuning” for its general approach and deep understanding. If you do not have this book yet, I suggest to at least flip through it to see how they approach explaining the various areas in SQL optimization

    Lukas

    8 Oct 07 at 5:57 am

  4. Hi Lukas,

    That’s because the outline has huge sections that should really be split into subsections, I think :-)

    Giuseppe Maxia also recommended SQL Performance Tuning. I have a copy but haven’t read it in a while. Thanks for the reminder.

    Xaprb

    8 Oct 07 at 8:57 am

  5. [...] we have on schema, index, and query optimization. The last two chapters I’ve written about (Query Performance Optimization and Advanced MySQL Features) have generated lots of feed back along the lines of “don’t [...]

  6. I have a table that contains duplicate data.

    Some more clarifications on the data set. The table consists of:
    id, url, …a bunch of stored data for that url…, timestamp

    The user can store historical data about different urls in the table. so it may look something like this as time goes on:

    id ¦ url ¦ …stored data … ¦ timestamp
    1 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-04 13:03:12
    2 ¦ http://www.webmasterworld.com ¦ …blah… ¦ 2005-08-04 13:33:12
    3 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-04 15:03:12
    4 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-06 10:00:02
    5 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-10 13:03:12
    6 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-11 13:03:12
    7 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-12 13:03:12
    8 ¦ http://www.msn.com ¦ …blah… ¦ 2005-08-20 13:13:12
    9 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-31 13:03:12

    The query I am trying to write needs to SELECT the last entry for a url (latest date or greatest id) relative to the other rows for the same url. (the last entry for mydomain.com for example). (This is why I was using GROUP BY, so that I would only get one row per url)

    The query needs to return the most recent entry for each url.

    id ¦ url ¦ …stored data … ¦ timestamp
    2 ¦ http://www.webmasterworld.com ¦ …blah… ¦ 2005-08-04 13:33:12
    4 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-06 10:00:02
    8 ¦ http://www.msn.com ¦ …blah… ¦ 2005-08-20 13:13:12
    9 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-31 13:03:12

    Does this give a more clear picture of the query I am trying to create?

    I am trying to pull all of the most recent rows from the table

    SELECT * FROM `data` GROUP BY ‘name’ ORDER BY ‘timestamp’

    Here im getting first group by records and on that result it will apply order by. But I want to apply order by first and on the first record it should apply group by.

    To over come above issue, I’m trying below query
    SELECT data.* FROM data INNER JOIN (SELECT MAX(id) AS id FROM data GROUP BY url) ids ON data.id = ids.id

    I’m getting some problem in performance as sub query always executed slow.

    If there is a simpler/more efficient way to do this – I would love to know.

    Any help would be appreciated.

    Amit Shah

    15 Dec 08 at 6:53 am

Leave a Reply