This release of MySQL Toolkit fixes some minor bugs, and adds major new functionality to MySQL Parallel Dump.
Big News: MySQL Parallel Dump
I wrote a lot more tests and cleaned up MySQL Parallel Dump a lot (fixed bugs with failed dumps not being reported, for instance) but the really big news is I added chunking functionality to it. Now you can say
mysql-parallel-dump --chunksize 100000
and it will try to divide each table into chunks with 100,000 rows each. It can do the chunks in parallel, so it can actually be running several dumps from one table at the same time. The chunking is fuzzy: it’s a hard problem, and I adapted (and improved) the code from MySQL Table Checksum to do it. If you can improve it, please contribute your fixes (the Sourceforge project page has several ways for you to do that).
You can also dump by size, which is probably more useful for most people. To do 10MB per chunk (approximately), use this command:
mysql-parallel-dump --chunksize 10M
This is a big deal not just because it lets you parallelize dumps from a single table, but because having the dump split up makes it easier to restore in small chunks, which as readers have pointed out is a big help on transactional storage engines.
The parallel restore tool is in incubation. In the meantime, please put this tool through its paces. Clearly it’s not yet well-tested and I look forward to your bug reports!
Changelog
Changelog for mysql-find: 2007-10-03: version 0.9.5 * The --dbregex parameter didn't work right. Changelog for mysql-heartbeat: 2007-10-03: version 1.0.1 * --check hung forever. Changelog for mysql-parallel-dump: 2007-10-03: version 0.9.6 * Arguments to external program weren't honored. * System exit codes were lost, so errors weren't reported. * Added chunking. * Modularized and tested. * Added documentation. * Made --locktables negatable. * Changed default output to be less verbose and added --verbose option. * Added summary output.Technorati Tags:backups, mysql, parallel dump, replication, sql, transactions

Well done! This looks like an awesome release. You mentioned that you missed errors (not checking exit codes), was one of the usual errors you hit getting no data dumped from your table?
I’ve done the same thing, but quite ugly and in bash, and for some reason, sometimes I would get the table definition and not the table data (I was using mysqldump –tab). Just wanted to know if you’d hit that.
I’ll hopefully be able to try it out later today and I’ll give you some feedback. The chunking of a dump sounds very nice :)
I haven’t seen specific sneaky errors like that. What was happening was there’s double-forking going on in the Perl program. First I fork a Perl worker process to handle each bit of work. Then it possibly makes a system call to run mysqldump.
At each step, when either the forking or the system call finishes, I gather the exit code of the process. Say the system call returned 2 (a bad command-line argument). When I gathered the exit code, I needed to shift it right by 8 bits to get the true exit code. Then my forked Perl process calls exit(2) and the main process does a waitpid() and gets that value.
The trouble was I wasn’t doing the shift-right so my child Perl process was calling exit(512) or some other large number, and apparently if you exit() with a value larger than 255 the parent will get 0 from waitpid(). So the mysqldump invocation was returning 2 and the program was eventually getting 0 — success, everything worked! NOT.
With the corrections, I was able to deliberately cause a bad command-line, and a permissions error, and verify that it’s reported as a failure in the eventual status summary.
Ah, interesting error. I had some problems with forking as well, mostly since I did it in bash I suppose. I’m no good with perl.
I’m loading a large dataset now, will try dumping afterwards. Do you have a set plan on how you want to do a parallel restore or would you get a lot of help if I gave you some dirty sh that would probably get it done, just as a proof of concept or something.
I don’t have a whole lot of plan on the restore yet. I was planning to provide two ways again: a wrapper around LOAD DATA INFILE, and shelling out to “gunzip -c db/tbl.000.sql.gz | mysql db” A lot of the wrapper will just be handling things like foreign keys, committing between chunks, etc I think — that and discovering files and deciding which db/tbl they belong in.
So yes, please send me the shell script. You can even attach it to a bug report at the Sourceforge page if you like. You can email me at xaprb at this domain. Thanks!