High Performance MySQL, Second Edition: Query Performance Optimization
Your comments on the Advanced MySQL Features chapter were great. A lot of the questions I got (in the comments and via email) about chapter 6 are really addressed in chapter 5, “Query Performance Optimization,” so I’m posting its outline too. I have the same questions: are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Any other comments or questions?
A bit of basic chapter information: It’s currently 45 pages and very nearly a first draft, I think. The outline might be subject to change if we decide to rearrange the material, which actually might help in some places. As I pulled out the headings for this article, I noticed that some of the sections are just huge, and might benefit from being divided up a little bit more. But that’s all going to come out in the wash; the focus right now is getting the material down in some order that makes sense, and making sure it’s clear, concise, and correct.
[Update: I got curious how the page count in OpenOffice.org will translate to the page count on paper, so I opened one of the chapters from the first edition and it looks like every 10 pages in OpenOffice.org will be about 11 pages on paper. So this chapter really ought to be about 50 pages on paper].
Here’s the chapter outline:
[Intro]
Slow Query Basics: Optimize Data Access
Are You Asking the Database For Data You Don't Need?
Is MySQL Examining Too Much Data?
Rows Examined and Rows Returned
Ways to Restructure Queries
Complex Queries Vs. Many Queries
Chopping Up a Query
Join Decomposition
Query Execution Basics
Query States
The Query Cache
The Query Compiler and Optimizer
The MySQL Query Optimizer
Table and Index Statistics
MySQL's Join Execution Strategy
The Join Optimizer
Returning Results to the Client
MySQL Query Optimizer Limitations
Correlated Subqueries
When a Correlated Subquery is Good
UNION limitations
Index Merge Optimizations
Equality Propagation
Parallel Execution
Hash Joins
Loose Index Scans
MIN() and MAX()
SELECT and UPDATE on the same table
Optimizing Specific Types of Queries
Optimizing COUNT() Queries
What COUNT() Does
Myths About MyISAM
Simple Optimizations
More Complex Optimizations
Optimizing JOIN Queries
Optimizing Subqueries
Optimizing GROUP BY and DISTINCT
Optimizing GROUP BY WITH ROLLUP
Optimizing ORDER BY
Optimizing LIMIT and OFFSET
Optimizing UNION
Optimizing Range Queries
Query Optimizer Hints
User-Defined Variables
Further Reading:
- High Performance MySQL, Second Edition: Schema Optimization and Indexing
- High Performance MySQL, Second Edition: Advanced SQL Functionality
- High Performance MySQL, Second Edition: Backup and Recovery
- High Performance MySQL, Second Edition: Replication, Scaling and High Availability
- Progress report on High Performance MySQL, Second Edition






Based on the outline I think this chapter is actually two chapters combined, one to discuss the optimizer and the theory behind it, and the second one is the optimization discussion. It’s the same as cookbooks – the “theory” and cooking how to is different section/chapter from the list of recipes.
What I’d like to see is different optimization rules based on the volume of data, from my experience (mostly Oracle) one should take different approach for optimizing based on the actual volume.
Ran H
7 Oct 07 at 6:02 pm
That’s a useful observation. You may have put your finger on a back-of-my-mind feeling about the chapter.
There’s a lot of stuff in chapter 7, Optimizing Server Settings, which talks about volume of data. I think in general we assume big datasets and/or heavy load. I will run this by the other authors and see what they say — frankly they have more experience with large-volume installations than I do (I manage data in the 10s to 100s of GB regularly, but not TB).
Xaprb
7 Oct 07 at 8:33 pm
Hmm, I presume stuff like slow query log, explain etc. are discussed in another chapter because I do not see where they would fit in the outline? I have done a number of talks on generic (though with a MySQL slant) talk on this topic. You might find my slides useful:
http://pooteeweet.org/slides
Although the book is a bit dated and alot of the comments about MySQL are no longer valid, I still recommend “SQL Performance Tuning” for its general approach and deep understanding. If you do not have this book yet, I suggest to at least flip through it to see how they approach explaining the various areas in SQL optimization
Lukas
8 Oct 07 at 5:57 am
Hi Lukas,
That’s because the outline has huge sections that should really be split into subsections, I think :-)
Giuseppe Maxia also recommended SQL Performance Tuning. I have a copy but haven’t read it in a while. Thanks for the reminder.
Xaprb
8 Oct 07 at 8:57 am
[...] we have on schema, index, and query optimization. The last two chapters I’ve written about (Query Performance Optimization and Advanced MySQL Features) have generated lots of feed back along the lines of “don’t [...]
High Performance MySQL, Second Edition: Schema Optimization and Indexing at Xaprb
14 Oct 07 at 9:43 pm
I have a table that contains duplicate data.
Some more clarifications on the data set. The table consists of:
id, url, …a bunch of stored data for that url…, timestamp
The user can store historical data about different urls in the table. so it may look something like this as time goes on:
id ¦ url ¦ …stored data … ¦ timestamp
1 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-04 13:03:12
2 ¦ http://www.webmasterworld.com ¦ …blah… ¦ 2005-08-04 13:33:12
3 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-04 15:03:12
4 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-06 10:00:02
5 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-10 13:03:12
6 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-11 13:03:12
7 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-12 13:03:12
8 ¦ http://www.msn.com ¦ …blah… ¦ 2005-08-20 13:13:12
9 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-31 13:03:12
…
The query I am trying to write needs to SELECT the last entry for a url (latest date or greatest id) relative to the other rows for the same url. (the last entry for mydomain.com for example). (This is why I was using GROUP BY, so that I would only get one row per url)
The query needs to return the most recent entry for each url.
id ¦ url ¦ …stored data … ¦ timestamp
2 ¦ http://www.webmasterworld.com ¦ …blah… ¦ 2005-08-04 13:33:12
4 ¦ http://www.cnn.com ¦ …blah… ¦ 2005-08-06 10:00:02
8 ¦ http://www.msn.com ¦ …blah… ¦ 2005-08-20 13:13:12
9 ¦ http://www.mydomoain.com ¦ …blah… ¦ 2005-08-31 13:03:12
Does this give a more clear picture of the query I am trying to create?
I am trying to pull all of the most recent rows from the table
SELECT * FROM `data` GROUP BY ‘name’ ORDER BY ‘timestamp’
Here im getting first group by records and on that result it will apply order by. But I want to apply order by first and on the first record it should apply group by.
To over come above issue, I’m trying below query
SELECT data.* FROM data INNER JOIN (SELECT MAX(id) AS id FROM data GROUP BY url) ids ON data.id = ids.id
I’m getting some problem in performance as sub query always executed slow.
If there is a simpler/more efficient way to do this – I would love to know.
Any help would be appreciated.
Amit Shah
15 Dec 08 at 6:53 am