Archive for the ‘data recovery’ tag
I keep reading that Ma.gnolia’s data is permanently lost because “a specialist had been unable to recover any data from the corrupted hard drive.” This is not in itself a reason to consider data completely lost.
It is not clear to me whether the hard drive itself is unusable, e.g. the spindle won’t spin and the head won’t read the ones and zeroes, or whether the filesystem is corrupted. It sounds to me, from reading Larry Hallf’s comments, like it’s a simple matter of filesystem corruption. And even if the disk is dead, there is apparently a backup made from the corrupted filesystem, so there should be more than one way to try to recover this data: “Ma.gnolia’s database server suffered from file system corruption, which also corrupted it’s database backup, even though it was on a separate system.”
You don’t need to recover your filesystem to recover your MySQL data. Shameless plug: Percona can do it for you. We can get the raw data off a block device without even trying to mount it as a filesystem. Recovering MySQL data is not the same as recovering other types of data. If the disk spins, it might be possible to recover data from it.
Whether it’s worth it or not is another matter. Percona data recovery isn’t cheap, but it’s worth it for at least some people. I cannot name names, but you are using services from companies that have retained Percona to recover from worse cases of data loss than this appears to be, going by the limited available information. The original reason we built our data recovery toolset was to help one of the world’s largest corporations.
But cost and time may not have been the driving factor here. Whoever the unnamed data recovery specialist was, they took a long time and got no results. And now Ma.gnolia has given up and declared it a lost cause, which is sad for their users. I hope Larry Halff didn’t pay for the results he didn’t get. And I hope he didn’t wipe out his corrupted backup yet.
In the meantime, at least this incident is shining a bright light on the need for tested, verified backups. I’ve had two clients ask me how they can avoid ending up the same way as Ma.gnolia.
This release fixes several bugs introduced in the last release as I replaced untested code with tested code — how ironic! Actually, I knew that was virtually guaranteed to happen. Anyway, all the bugs you’ve helped me find are now fixed. I also fixed a long-standing bug in MySQL Table Sync, which I am otherwise trying to touch as little as possible for the time being. (Remember to contribute to the bounty, and get your employer to contribute as well, so I can do some real work on it in the next month or so!)
The other big news is that the parallel dump and restore tools are now 1.0.0 because I consider them feature-complete. I have put the most work into tab-separated dumps. These two tools can do something MySQL AB’s tools can’t currently do: restore data before creating triggers (when doing tab-delimited dumps). That’s an obvious requirement for loading data when tables have triggers. If you create the triggers before loading the data, you’re practically guaranteed to end up with different data than was dumped. The tools now dump and reload both triggers and views. As long as you’re dumping the mysql database, I think they should be able to completely duplicate a server (my initial goal was just data, not routines/triggers/views/etc).
Honestly, I hope MySQL’s tools make this pair of tools obsolete in the future, but until then, they’re a good way to dump and reload data at higher speeds. Keith Murphy did some measurements on parallel dump and restore speeds.
Here’s the full changelog:
Changelog for mysql-archiver: 2007-11-12: version 1.0.3 * The --no-ascend option caused too many bind variables to be used. Changelog for mysql-parallel-dump: 2007-11-12: version 1.0.0 * Dump views when --tab is given. * Use a module to find databases and tables. * Do not shell out to mysqldump for --tab. * Removed the --opt option. * Check for valid options to mysqldump. * Dump table definition and triggers separately for --tab. Changelog for mysql-parallel-restore: 2007-11-12: version 1.0.0 * Removed the --sql option, as sort order is implied when --tab is given. * Added code to load .trg files (triggers) and load 00_views files. * Print out files that are not loaded. Changelog for mysql-table-checksum: 2007-11-12: version 1.1.18 * DSN Parsing was broken. Changelog for mysql-table-sync: 2007-11-12: version 0.9.9 * DSN parsing was broken when --synctomaster was given with one DSN. * Changed --replicate to --synctomaster option. * Errors were being hidden in an EVAL when --execute was specified (bug #1819744).
The new release of MySQL Toolkit (version 1051) updates MySQL Parallel Dump in minor ways, but more importantly, it adds MySQL Parallel Restore.
MySQL Parallel Restore is the reverse of MySQL Parallel Dump. You give it one or more files and/or directories, and it discovers all the files contained within them and loads them in parallel. It understands how to load SQL and/or TXT/CSV files. If you give it some of both, it loads the SQL first and then loads the TXT/CSV as delimited files with LOAD DATA INFILE.
It does not parallelize a single table. That is, it doesn’t try to load two files into a table at the same time. But if you’re loading multiple tables, it will do them in parallel.
It has what I consider to be smart defaults. For example, by default it commits between each delimited file it loads. And since the dump tool makes it easy to dump a table in chunks, this makes it much easier on the server to restore a very large table.
I’ve been following the “release early, release often” philosophy with these two tools. You should test carefully before you trust them with your data. If you can’t restore your data accurately, it’s probably a bug. I’ve been testing with the following procedure:
mysql-table-checksum -a ACCUM localhost > checksum-before mysqldump --all-databases | gzip -c - > sanity.gz mysql-parallel-dump ... options ... mysql-parallel-restore ... options ... default/ mysql-table-checksum -a ACCUM localhost > checksum-after mysql-checksum-filter checksum-before checksum-after
If the checksums don’t match after restoring, you can restore the original data from the sanity dump. I encourage you to report any bugs you find with this procedure. Incidentally, this exercise taught me that
LOAD DATA INFILE is pretty hard to get just right. It has all sorts of weird dependencies on character sets that aren’t documented. That’s why I’m a little cautious and I’m asking you to tell me if you can’t restore correctly.