Xaprb

Stay curious!

How to check and optimize MySQL tables in parallel

with one comment

I wanted to point out something that might not be obvious from the name: MySQL Parallel Dump can be used as a generic wrapper to discover tables and databases, and fork off worker processes to do something to them in parallel. That “something” can easily be invoking mysqlcheck — or any other program. This makes it really easy for you to do multi-threaded whatever-you-need-to-do on MySQL tables. Here’s how:

mysql-parallel-dump [options] -- 'mysqlcheck --optimize %D %N'

There are several things going on here:

  1. You’re running mysql-parallel-dump with all the ordinary options. Some of them are really specific to dumping data, but not all that many — most of the options are about choosing which databases to include and exclude, and so on.
  2. You’re adding a double dash -- to make it stop processing any further options.
  3. The rest of the arguments are being treated as a system command, but…
  4. Not before interpolating the database and table name into them. The %D and %N are a little macro language. There are some other macros too — see the documentation.

The net effect is to loop through all the tables and run OPTIMIZE TABLE on them.

MySQL Parallel Dump takes responsibility for noticing the exit status of the system command, keeping track of times, and reporting it all when it’s done. And its functionality for working on sets of things is also generic. You could easily create a table of “optimization jobs” and point it at that table, perhaps using the --age option, and it would obediently do what the table’s contents specify:

mysql> select setname, db, tbl from test.opti_job;
+-----------+--------+------------+
| setname   | db     | tbl        |
+-----------+--------+------------+
| dvd_store | sakila | film       | 
| dvd_store | sakila | film_actor | 
| set1      | test   | t1         | 
| set1      | test   | t2         | 
+-----------+--------+------------+
$ mysql-parallel-dump --nolocktables --sets set1,dvd_store --settable test.opti_job -- 'mysqlcheck --optimize %D %N > /dev/null'
        set1:              2 tables,     2 chunks,     2 successes,  0 failures,  0.14 wall-clock time,  0.17 dump time
   dvd_store:              2 tables,     2 chunks,     2 successes,  0 failures,  0.51 wall-clock time,  0.85 dump time
Final result:  2 sets,     4 tables,     4 chunks,     4 successes,  0 failures,  0.65 wall-clock time,  1.02 dump time

Much of the code for any kind of parallel tool is generic. I put a little extra time into this tool to make that code reusable, not special-purpose.

Written by Xaprb

October 3rd, 2007 at 10:43 pm

Posted in Uncategorized

Tagged with , , , , ,

One Response to 'How to check and optimize MySQL tables in parallel'

Subscribe to comments with RSS

  1. Thanks for this post. I am a little confused about the syntax. Here’s my command:

    mk-parallel-dump –port=3306 –user=user –password=pass –databases=myforum –tables=mytable1,mytable2 –noresume –verbose localhost — ‘/usr/bin/mysqlcheck –user=user –password=pass –port=3306 –check %D %N’

    And the output is this:

    CHUNK TIME EXIT SKIPPED DATABASE.TABLE
    tbl 0.02 0 0 forum.mytable1
    tbl 0.16 0 0 forum.mytable2
    db 0.16 0 0 forum
    all 0.16 0 0 -

    However, the tables have not been checked. Can you help me understand what I’m doing wrong?

    rolly

    14 Jan 11 at 2:51 pm

Leave a Reply