Xaprb

Stay curious!

Archive for the ‘optimization’ tag

High Performance MySQL, Second Edition: Query Performance Optimization

with 6 comments

Your comments on the Advanced MySQL Features chapter were great. A lot of the questions I got (in the comments and via email) about chapter 6 are really addressed in chapter 5, “Query Performance Optimization,” so I’m posting its outline too. I have the same questions: are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Any other comments or questions?

A bit of basic chapter information: It’s currently 45 pages and very nearly a first draft, I think. The outline might be subject to change if we decide to rearrange the material, which actually might help in some places. As I pulled out the headings for this article, I noticed that some of the sections are just huge, and might benefit from being divided up a little bit more. But that’s all going to come out in the wash; the focus right now is getting the material down in some order that makes sense, and making sure it’s clear, concise, and correct.

[Update: I got curious how the page count in OpenOffice.org will translate to the page count on paper, so I opened one of the chapters from the first edition and it looks like every 10 pages in OpenOffice.org will be about 11 pages on paper. So this chapter really ought to be about 50 pages on paper].

Here’s the chapter outline:

[Intro]
Slow Query Basics: Optimize Data Access
  Are You Asking the Database For Data You Don't Need?
  Is MySQL Examining Too Much Data?
    Rows Examined and Rows Returned
Ways to Restructure Queries
  Complex Queries Vs. Many Queries
  Chopping Up a Query
  Join Decomposition
Query Execution Basics
  Query States
  The Query Cache
  The Query Compiler and Optimizer
    The MySQL Query Optimizer
    Table and Index Statistics
    MySQL's Join Execution Strategy
    The Join Optimizer
  Returning Results to the Client
MySQL Query Optimizer Limitations
  Correlated Subqueries
  When a Correlated Subquery is Good
  UNION limitations
  Index Merge Optimizations
  Equality Propagation
  Parallel Execution
  Hash Joins
  Loose Index Scans
  MIN() and MAX()
  SELECT and UPDATE on the same table
Optimizing Specific Types of Queries
  Optimizing COUNT() Queries
    What COUNT() Does
    Myths About MyISAM
    Simple Optimizations
    More Complex Optimizations
  Optimizing JOIN Queries
  Optimizing Subqueries
  Optimizing GROUP BY and DISTINCT
    Optimizing GROUP BY WITH ROLLUP
  Optimizing ORDER BY
  Optimizing LIMIT and OFFSET
  Optimizing UNION
  Optimizing Range Queries
Query Optimizer Hints
User-Defined Variables

Written by Xaprb

October 7th, 2007 at 4:05 pm

Posted in Uncategorized

Tagged with , , ,

High Performance MySQL, Second Edition: Advanced SQL Functionality

with 10 comments

Work continues apace on High Performance MySQL, Second Edition (the link leads to the chapter outline). I’m working now on Chapter 6, Advanced SQL Functionality, and thought I’d solicit input on it. Are there things you’d like to see us cover? Do you have any favorite techniques you’d like to see us include? Feel free to leave feedback in the comments. The chapter is already significantly done, with 26 pages written, but the ink’s not on paper yet, so there’s still time to correct omissions!

I should note that there are separate chapters on architecture, schema and indexing design, application optimization, query optimization, etc. We’re trying to focus this chapter on “advanced features” and how to get high performance out of them. Also, the outline is still subject to change: there’s so much material that it’s hard to decide the best place to put something, and exactly what is an “advanced feature” might be open to interpretation, or we might feel something is better placed elsewhere. That said, please throw your ideas at us and we’ll worry about the details for you. Here’s the outline of this chapter, as of three minutes ago:

[Intro]
The MySQL Query Cache
  Operational Detail and Caveats
  Optimizations
  Alternatives
Prepared Statements and Multiple Query Execution
  Client-side Prepared Statements
  Client-side Prepared Statements
  Optimization of the Execution Plan
  Server-side Prepared Statements
  Prepared Statements in Stored Procedures
  Prepared Statement Caveats
Cursors
  Client-side Cursors
Stored Procedures
  Pros and Cons
  Recommendation
  Example
User Defined Functions
Events
Views
  Updatable Views
  Limitations of Updatable Views
  Security
  Performance
Triggers
Working with Multi-byte Character Sets
Full-Text Search
  Full-Text Search Queries
  Changes in MySQL 5.1
  Full-Text Trade-Offs and Workarounds
  Full-Text Tuning and Optimization
Foreign Key Constraints
Merge Tables and Partitioning
  Merge Tables
    Merge Table Performance Limitations
    Merge Table Strengths
  Partitioned Tables
    Optimizing Queries Against Partitioned Tables
    Maintaining Partitioned Tables
Distributed Transactions (XA)

Written by Xaprb

October 5th, 2007 at 11:11 am

Posted in Uncategorized

Tagged with , , ,

How to check and optimize MySQL tables in parallel

with one comment

I wanted to point out something that might not be obvious from the name: MySQL Parallel Dump can be used as a generic wrapper to discover tables and databases, and fork off worker processes to do something to them in parallel. That “something” can easily be invoking mysqlcheck — or any other program. This makes it really easy for you to do multi-threaded whatever-you-need-to-do on MySQL tables. Here’s how:

mysql-parallel-dump [options] -- 'mysqlcheck --optimize %D %N'

There are several things going on here:

  1. You’re running mysql-parallel-dump with all the ordinary options. Some of them are really specific to dumping data, but not all that many — most of the options are about choosing which databases to include and exclude, and so on.
  2. You’re adding a double dash -- to make it stop processing any further options.
  3. The rest of the arguments are being treated as a system command, but…
  4. Not before interpolating the database and table name into them. The %D and %N are a little macro language. There are some other macros too — see the documentation.

The net effect is to loop through all the tables and run OPTIMIZE TABLE on them.

MySQL Parallel Dump takes responsibility for noticing the exit status of the system command, keeping track of times, and reporting it all when it’s done. And its functionality for working on sets of things is also generic. You could easily create a table of “optimization jobs” and point it at that table, perhaps using the --age option, and it would obediently do what the table’s contents specify:

mysql> select setname, db, tbl from test.opti_job;
+-----------+--------+------------+
| setname   | db     | tbl        |
+-----------+--------+------------+
| dvd_store | sakila | film       | 
| dvd_store | sakila | film_actor | 
| set1      | test   | t1         | 
| set1      | test   | t2         | 
+-----------+--------+------------+
$ mysql-parallel-dump --nolocktables --sets set1,dvd_store --settable test.opti_job -- 'mysqlcheck --optimize %D %N > /dev/null'
        set1:              2 tables,     2 chunks,     2 successes,  0 failures,  0.14 wall-clock time,  0.17 dump time
   dvd_store:              2 tables,     2 chunks,     2 successes,  0 failures,  0.51 wall-clock time,  0.85 dump time
Final result:  2 sets,     4 tables,     4 chunks,     4 successes,  0 failures,  0.65 wall-clock time,  1.02 dump time

Much of the code for any kind of parallel tool is generic. I put a little extra time into this tool to make that code reusable, not special-purpose.

Written by Xaprb

October 3rd, 2007 at 10:43 pm

Posted in Uncategorized

Tagged with , , , , ,