Tag Archive for 'parallel-dump'

MySQL Toolkit version 1254 released

Download MySQL Toolkit

This release fixes several bugs introduced in the last release as I replaced untested code with tested code — how ironic! Actually, I knew that was virtually guaranteed to happen. Anyway, all the bugs you’ve helped me find are now fixed. I also fixed a long-standing bug in MySQL Table Sync, which I am otherwise trying to touch as little as possible for the time being. (Remember to contribute to the bounty, and get your employer to contribute as well, so I can do some real work on it in the next month or so!)

The other big news is that the parallel dump and restore tools are now 1.0.0 because I consider them feature-complete. I have put the most work into tab-separated dumps. These two tools can do something MySQL AB’s tools can’t currently do: restore data before creating triggers (when doing tab-delimited dumps). That’s an obvious requirement for loading data when tables have triggers. If you create the triggers before loading the data, you’re practically guaranteed to end up with different data than was dumped. The tools now dump and reload both triggers and views. As long as you’re dumping the mysql database, I think they should be able to completely duplicate a server (my initial goal was just data, not routines/triggers/views/etc).

Honestly, I hope MySQL’s tools make this pair of tools obsolete in the future, but until then, they’re a good way to dump and reload data at higher speeds. Keith Murphy did some measurements on parallel dump and restore speeds.

Here’s the full changelog:

Changelog for mysql-archiver:

2007-11-12: version 1.0.3

   * The --no-ascend option caused too many bind variables to be used.

Changelog for mysql-parallel-dump:

2007-11-12: version 1.0.0

   * Dump views when --tab is given.
   * Use a module to find databases and tables.
   * Do not shell out to mysqldump for --tab.
   * Removed the --opt option.
   * Check for valid options to mysqldump.
   * Dump table definition and triggers separately for --tab.

Changelog for mysql-parallel-restore:

2007-11-12: version 1.0.0

   * Removed the --sql option, as sort order is implied when --tab is given.
   * Added code to load .trg files (triggers) and load 00_views files.
   * Print out files that are not loaded.

Changelog for mysql-table-checksum:

2007-11-12: version 1.1.18

   * DSN Parsing was broken.

Changelog for mysql-table-sync:

2007-11-12: version 0.9.9

   * DSN parsing was broken when --synctomaster was given with one DSN.
   * Changed --replicate to --synctomaster option.
   * Errors were being hidden in an EVAL when --execute was specified (bug #1819744).
Technorati Tags:, , , ,

You might also like:

  1. Introducing MySQL Parallel Restore
  2. MySQL Toolkit version 946 released
  3. Maatkit version 1709 released
  4. Maatkit version 1877 released
  5. Maatkit version 1508 released

Introducing MySQL Parallel Restore

Download MySQL Toolkit

The new release of MySQL Toolkit (version 1051) updates MySQL Parallel Dump in minor ways, but more importantly, it adds MySQL Parallel Restore.

MySQL Parallel Restore is the reverse of MySQL Parallel Dump. You give it one or more files and/or directories, and it discovers all the files contained within them and loads them in parallel. It understands how to load SQL and/or TXT/CSV files. If you give it some of both, it loads the SQL first and then loads the TXT/CSV as delimited files with LOAD DATA INFILE.

It does not parallelize a single table. That is, it doesn’t try to load two files into a table at the same time. But if you’re loading multiple tables, it will do them in parallel.

It has what I consider to be smart defaults. For example, by default it commits between each delimited file it loads. And since the dump tool makes it easy to dump a table in chunks, this makes it much easier on the server to restore a very large table.

I’ve been following the “release early, release often” philosophy with these two tools. You should test carefully before you trust them with your data. If you can’t restore your data accurately, it’s probably a bug. I’ve been testing with the following procedure:

mysql-table-checksum -a ACCUM localhost > checksum-before
mysqldump --all-databases | gzip -c - > sanity.gz
mysql-parallel-dump ... options ... 
mysql-parallel-restore ... options ... default/
mysql-table-checksum -a ACCUM localhost > checksum-after
mysql-checksum-filter checksum-before checksum-after

If the checksums don’t match after restoring, you can restore the original data from the sanity dump. I encourage you to report any bugs you find with this procedure. Incidentally, this exercise taught me that LOAD DATA INFILE is pretty hard to get just right. It has all sorts of weird dependencies on character sets that aren’t documented. That’s why I’m a little cautious and I’m asking you to tell me if you can’t restore correctly.

Technorati Tags:, , , ,

You might also like:

  1. MySQL Toolkit version 1254 released
  2. MySQL Toolkit version 946 released
  3. Maatkit version 1674 released
  4. MySQL Toolkit version 1030 released
  5. MySQL Toolkit version 989 released

MySQL Toolkit version 1030 released

Download MySQL Toolkit

This release of MySQL Toolkit updates MySQL Parallel Dump. Together you and I found a few bugs in it (table locking, argument quoting, exit status code). The restore utility is in progress.

Technorati Tags:, , , ,

You might also like:

  1. MySQL Toolkit version 1011 released
  2. Progress on High Performance MySQL Backup and Recovery chapter
  3. MySQL Toolkit version 946 released
  4. Introducing MySQL Parallel Restore
  5. MySQL Toolkit version 1254 released

MySQL Toolkit version 1011 released

Download MySQL Toolkit

MySQL Toolkit version numbers are based on Subversion revision number. This release is the first past the 1,000-commit milestone. It also marks several days of being in Sourceforge’s top 100 most active projects. It has been in the top 300 for a couple of months, and the top 1000 for, um, a long time. While I would hasten to say I’m not a popularity-contest-focused person, it’s rewarding to see that people think this project is important and useful.

This release of MySQL Toolkit updates MySQL Parallel Dump. I had been using it on a relatively small server; yesterday I took a deep breath and started using it to generate backups from a large server with lots of data and lots of queries. Of course I found a couple bugs and decided I needed more functionality and error handling. The major new functionality is for efficiency; it defers locking as late as possible and releases locks as soon as possible, and with the --setperdb option it treats each database as a set to be locked and dumped together. I also added some information that will be helpful when restoring a table dumped in chunks: the range of values in each chunk. And finally, I made it able to deal with some race conditions like a table being dropped between the time it’s discovered and the time it’s locked (this is very relevant for me because I avoid temporary tables so replication is restartable).

I don’t have a timeline for when I’ll write the corresponding restore utility, but the answer is probably “soon.” This is very much a need-driven project. To begin with, I’m replacing a dump system that didn’t allow point-in-time recovery. Now I’ve got the data I need for point-in-time recovery, but if I have to do that it’ll be a manual job until I write the restore utility.

I am very focused on recovery, not backup, as you’ll see if you buy the second edition of High Performance MySQL :-) I’m just solving my needs in the order of urgency: one must have a backup to do a restoration. I generally don’t like the “urgent, fix now” approach! (For various reasons I won’t get into, I am not able to use ZRM, but I would ordinarily recommend it over rolling your own solution).

Technorati Tags:, , , ,

You might also like:

  1. Progress on High Performance MySQL Backup and Recovery chapter
  2. MySQL Toolkit version 946 released
  3. Introducing MySQL Parallel Dump
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. MySQL Toolkit version 1030 released

MySQL Toolkit version 989 released

Download MySQL Toolkit

This release of MySQL Toolkit fixes some minor bugs, and adds major new functionality to MySQL Parallel Dump.

Big News: MySQL Parallel Dump

I wrote a lot more tests and cleaned up MySQL Parallel Dump a lot (fixed bugs with failed dumps not being reported, for instance) but the really big news is I added chunking functionality to it. Now you can say

mysql-parallel-dump --chunksize 100000

and it will try to divide each table into chunks with 100,000 rows each. It can do the chunks in parallel, so it can actually be running several dumps from one table at the same time. The chunking is fuzzy: it’s a hard problem, and I adapted (and improved) the code from MySQL Table Checksum to do it. If you can improve it, please contribute your fixes (the Sourceforge project page has several ways for you to do that).

You can also dump by size, which is probably more useful for most people. To do 10MB per chunk (approximately), use this command:

mysql-parallel-dump --chunksize 10M

This is a big deal not just because it lets you parallelize dumps from a single table, but because having the dump split up makes it easier to restore in small chunks, which as readers have pointed out is a big help on transactional storage engines.

The parallel restore tool is in incubation. In the meantime, please put this tool through its paces. Clearly it’s not yet well-tested and I look forward to your bug reports!

Changelog

coffee grinder cuisinart coffee grinder la pavoni coffee grinder black and decker coffee grinder bodum coffee grinder mahlkonig coffee grinder mr coffee coffee grinder hamilton beach coffee grinder bunn coffee grinder jura coffee grinder astra coffee grinder delonghi coffee grinder grindmaster coffee grinder burr coffee grinder brewer coffee grinder bosch coffee grinder melitta coffee grinder electric coffee grinder antique coffee grinder electric skillet presto electric skillet rival electric skillet west bend electric skillet villaware electric skillet toastess electric skillet black and decker electric skillet hamilton beach electric skillet cuisinart electric skillet sunpentown electric skillet aroma electric skillet sunbeam electric skillet saladmaster electric skillet farberware electric skillet oster electric skillet ge electric skillet
Changelog for mysql-find:

2007-10-03: version 0.9.5

   * The --dbregex parameter didn't work right.

Changelog for mysql-heartbeat:

2007-10-03: version 1.0.1

   * --check hung forever.

Changelog for mysql-parallel-dump:

2007-10-03: version 0.9.6

   * Arguments to external program weren't honored.
   * System exit codes were lost, so errors weren't reported.
   * Added chunking.
   * Modularized and tested.
   * Added documentation.
   * Made --locktables negatable.
   * Changed default output to be less verbose and added --verbose option.
   * Added summary output.
Technorati Tags:, , , , ,

You might also like:

  1. MySQL Toolkit version 1254 released
  2. Introducing MySQL Parallel Restore
  3. MySQL Toolkit version 946 released
  4. MySQL Toolkit version 1030 released
  5. How to check and optimize MySQL tables in parallel

MySQL Toolkit version 946 released

Download MySQL Toolkit

This release of MySQL Toolkit adds a new tool, fixes some minor bugs, and adds new functionality to one of the helper scripts.

New tool: MySQL Parallel Dump

I wrote an introduction to MySQL Parallel Dump yesterday. It’s a much smarter way to dump your data if you have a lot of it, and it’s actually a very usable lightweight multi-threaded backup tool (it can do most dump-oriented backup jobs without a wrapper script, in my opinion).

Changelog

Changelog for mysql-parallel-dump:

2007-10-01: version 0.9.5

   * Initial release.

Changelog for mysql-table-checksum:

2007-10-01: version 1.1.16

   * Made mysql-checksum-filter able to compare tables in different databases.

Changelog for mysql-table-sync:

2007-10-01: version 0.9.7

   * The special command-line syntax didn't allow a bare hostname.
   * Added an informative printout of what is being synced.
Technorati Tags:, , , ,

You might also like:

  1. Maatkit version 1508 released
  2. Introducing MySQL Parallel Restore
  3. MySQL Toolkit version 1254 released
  4. Maatkit version 1579 released
  5. Maatkit version 1877 released

Introducing MySQL Parallel Dump

A while ago Peter Zaitsev wrote about his wishes for mysqldump. These included multi-threaded dumps and “safe” dumps that would wait for a server to restart if it crashed, then keep dumping other tables. I’ve had sketches of this done for a while, but during this week I fleshed it out while writing about backup and recovery for our upcoming book. I had my own list of features I wanted to add:

  • Support for backup sets, with the backup configuration stored in the database itself.
  • Emphasis on tab-delimited dumps.
  • Sane defaults, focused on ease of use and ease of recovery.
  • Support for compression by default.

The resulting script is satisfactory to me. If you just run it without arguments, it connects to the server mentioned in your .my.cnf file and dumps all databases and tables, one file per table, gzipped, in parallel (at least two, but by default it detects the number of CPUs and runs that many in parallel).

baron@kanga $ mysql-parallel-dump
SET     DATABASE TABLE                      TIME STATUS THREADS
default mysql    columns_priv                  0      0       2
default mysql    db                            0      0       2
default mysql    help_category                 0      0       2
default mysql    func                          0      0       2
default mysql    help_keyword                  0      0       2
...snip...
default test     t1                            0      0       2
default test     t2                            0      0       1

You can tell it to dump elsewhere, and it’s easy to dump all tables in tab-delimited format. Here it’s reading its configuration from the database, writing to /tmp, and not backing up tables that have been dumped in the last 5 minutes:

baron@kanga $ mysql-parallel-dump --basedir /tmp --tab --sets set1 \
    --settable test.backupset --age 5m
Nothing to do for set set1
baron@kanga $ ls -lR /tmp/set1
/tmp/set1:
total 8
-rw-rw-rw- 1 baron baron   40 2007-09-30 21:43 00_master_data.sql
drwxrwxrwx 2 baron baron 4096 2007-09-30 21:43 test

/tmp/set1/test:
total 16
-rw-rw-rw- 1 baron baron 549 2007-09-30 21:43 t1.sql.gz
-rw-rw-rw- 1 baron baron  31 2007-09-30 21:43 t1.txt.gz
-rw-rw-rw- 1 baron baron 550 2007-09-30 21:43 t2.sql.gz
-rw-rw-rw- 1 baron baron  29 2007-09-30 21:43 t2.txt.gz

And as you can see, it knows I’ve dumped those tables recently and didn’t do them again. Pretty handy for scheduling and resuming backups, no? It makes it easy to keep going if something happens in the middle of the backup and you want to restart.

I’m aware of the similar mysqlpdump script, and I generally don’t like duplicating other people’s efforts, but I decided to go ahead and finish what I’d started. To tell you the truth, neither script is complicated. It’s just a matter of providing a sensible wrapper around existing functionality (in my case, that’s mysqldump and SELECT INTO OUTFILE, which I do directly rather than asking mysqldump to do it with -T, which just makes mysqldump into the same kind of wrapper). I also wanted to provide it as part of the MySQL Toolkit, so it’s all in the same place. Frankly, I also built in a lot more functionality than mysqlpdump has, and I consider the defaults to be more useful. I’d love for mysqldump itself to have better defaults — especially for dumping large datasets, which it’s frankly pretty poor at right now. One of these days MySQL AB will make me obsolete, I just know it…

Oh, and in keeping with my tradition, it’s sort of ridiculously sophisticated and overly generic. It has a little macro language that you can use to basically turn it into a loop iterator over the selected databases and tables, and run any command you wish. Here’s an example:

mysql-parallel-dump -- mysqldump --skip-lock-tables '%D' '%N' \| gzip --fast -c - \> '%D.%N.gz'

That basically duplicates the built-in defaults (except the defaults are actually a lot more complicated than that). But it illustrates how you could use this as a shell to select which tables to dump and fork off sub-processes, handling all the locking, error checking, and so forth for them. Here I’m spawning off mysqldump, but it would be just as easy to execute a custom script.

There’s one more wish Peter and I both have, but which is impossible for right now as far as we know. That’s to do parallel SELECT INTO OUTFILE dumps for a bunch of tables in one transaction. This will not be possible until more than one connection can participate in a single transaction. Ask the MySQL guys about that one!

This script is part of MySQL Toolkit and will be released as soon as I have time. There are a few other bug fixes I want to include. In the meantime, if you’re dying to get it, you can grab it from the MySQL Toolkit subversion repository.

Technorati Tags:, , , , , , ,

You might also like:

  1. MySQL Toolkit version 946 released
  2. Introducing MySQL Parallel Restore
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. MySQL Toolkit version 1254 released
  5. Introducing MySQL Deadlock Logger