Xaprb

Stay curious!

Archive for the ‘mysqldump’ tag

A review of SecoBackup Amazon S3 backups for MySQL

with 5 comments

After I wrote about things you need to know about MySQL backups, a customer contacted me and asked me what I know about SecoBackup for MySQL. I see it has a very low cost and Percona has Amazon accounts for testing purposes, so I quickly downloaded s3sql_2.2.0.1-2.01_i386.deb, installed it, configured it, and gave it a whirl.

Since I just want to see what it does to take a backup, I started up a sandbox running from /tmp/12345 and configured it to backup msandbox:msandbox@127.0.0.1 (I shut down my main mysqld on my laptop to make sure it can’t connect to the default instance).

Then I configured a backup set and tried to take a backup. Right away I saw it isn’t full featured enough. It doesn’t let you specify a port to connect to. This is fixable. They just have to add better support for all the command-line parameters like port, SSL, socket, etc. It is not enough to specify host, username and password. But it gives me a funny feeling about their level of experience with production MySQL servers.

And so I expected to see it using mysqldump. I tried a backup, and indeed I saw it’s using mysqldump. It could not connect to port 12345, which is where I set up my sandbox. I see no way to make it do that. I tried changing the port to the default 3306 and tried again, but it still doesn’t manage to connect to it to take a backup.

In any case, mysqldump might be a fine way to take a backup, for some people, but I need to see what command-line parameters it’s using before I can determine. So I hacked around for a while, got it to take a backup of my default instance, and eventually found out how to get the parameters it used to dump the data. Here are the parameters:

–opt –extended-insert –single-transaction –default-character-set=utf8 –create-options

I also turned on the log to my mysql instance and examined it afterwards to find out more about what this software really does to the database. I didn’t see anything unusual.

So having done that, how does SecoBackup stack up against my list of ten things to know?

  1. Does the backup require shutting down MySQL? If not, what is the impact on the running server? Blocking, I/O load, cache pollution, etc? Answer: It uses mysqldump, so you can use it online, but you’ll get all the usual stuff: cache pollution, extra load on the server, and so on. For large databases, mysqldump is unusable, so it won’t work for them.
  2. What technique is used for the backup? Is it mysqldump or a custom product that does something similar? Is it a filesystem copy? Answer: mysqldump.
  3. Does the backup system understand that you cannot back up InnoDB by simply copying its files? Answer: It doesn’t do file copies.
  4. Does the backup use FLUSH TABLES, LOCK TABLES, or FLUSH TABLES WITH READ LOCK? These all interrupt processing. Answer: None of the above.
  5. What other effects are there on MySQL? I’ve seen systems that do a RESET MASTER, which immediately breaks replication. Are there any FLUSH commands at all, like FLUSH LOGS? Answer: None of the above.
  6. How does the system guarantee that you can perform point-in-time recovery? Answer: It does not. It doesn’t capture the binary log positions.
  7. How does the system guarantee consistency with the binary log, InnoDB logs, and replication? Answer: It does not. It doesn’t capture the binary log positions.
  8. Can you use the system to set up new MySQL replication slaves? How? Answer: No. It does not capture the master log positions.
  9. Does the system verify that the backup is restorable, e.g. does it run InnoDB recovery before declaring success? Answer: It does not verify backups.
  10. Does anyone stand behind it with support, and guarantee working, recoverable backups? How strong is the legal guarantee of this and how much insurance do they have? Answer: I’ll skip this question.

And then there’s the other question someone asked in the comments on the original article: how long does it take to restore the backup? The answer is, for big databases it’s going to take A Very Long Time. This is another reason why mysqldump is unusable for backing up large databases.

Overall I’m not all that impressed with the quality of the software; I crashed it a number of times trying to set things up and take backups, and it does sort of naive things like print output without a trailing linebreak so the terminal gets messed up. But whether it’s a good choice really depends, I think, on your data. (Naturally.) It seems like it’s pretty convenient, but even if they fixed the problems and added –master-data to the mysqldump options, it would absolutely not work for a lot of the systems I work on. Even if you could back up some of those servers with mysqldump, it would take way too long to restore.

But here is the most important thing: It is neither Open Source nor Free Software, and I had to download it and try it out to find out that it uses mysqldump rather than some other technique. Nowhere on their website does SecoBackup mention that they take backups via mysqldump. That’s basic information that I would like to see right up front. Any backup system should disclose how it works.

Written by Baron Schwartz

March 7th, 2009 at 3:51 pm

Posted in SQL,Tools

Tagged with , , ,

Restoring from a mysqldump into tables with triggers

with 2 comments

This is actually old news, but I never thought to file a bug report (until now) or say anything to anyone about it. If you use mysqldump to dump and restore a MySQL table that has INSERT triggers, you can get different data in your restored database than you had when you dumped. The problem? The tool dumps the triggers before the data, so they get added back to the table before the rows are inserted.

The fix for this is really trivial. Just move the triggers after the INSERTs. Someone should patch mysqldump, but I’ve been too lazy… and besides, you know what I say about me and C programming. Unsafe at any speed.

This isn’t a problem with mk-parallel-dump, by the way. It puts the triggers in separate files, and mk-parallel-restore adds them back after the data is restored.

Written by Baron Schwartz

January 8th, 2009 at 8:25 am

Posted in Maatkit,SQL

Tagged with , , ,

Introducing MySQL Parallel Dump

with 19 comments

A while ago Peter Zaitsev wrote about his wishes for mysqldump. These included multi-threaded dumps and “safe” dumps that would wait for a server to restart if it crashed, then keep dumping other tables. I’ve had sketches of this done for a while, but during this week I fleshed it out while writing about backup and recovery for our upcoming book. I had my own list of features I wanted to add:

  • Support for backup sets, with the backup configuration stored in the database itself.
  • Emphasis on tab-delimited dumps.
  • Sane defaults, focused on ease of use and ease of recovery.
  • Support for compression by default.

The resulting script is satisfactory to me. If you just run it without arguments, it connects to the server mentioned in your .my.cnf file and dumps all databases and tables, one file per table, gzipped, in parallel (at least two, but by default it detects the number of CPUs and runs that many in parallel).

baron@kanga $ mysql-parallel-dump
SET     DATABASE TABLE                      TIME STATUS THREADS
default mysql    columns_priv                  0      0       2
default mysql    db                            0      0       2
default mysql    help_category                 0      0       2
default mysql    func                          0      0       2
default mysql    help_keyword                  0      0       2
...snip...
default test     t1                            0      0       2
default test     t2                            0      0       1

You can tell it to dump elsewhere, and it’s easy to dump all tables in tab-delimited format. Here it’s reading its configuration from the database, writing to /tmp, and not backing up tables that have been dumped in the last 5 minutes:

baron@kanga $ mysql-parallel-dump --basedir /tmp --tab --sets set1 \
    --settable test.backupset --age 5m
Nothing to do for set set1
baron@kanga $ ls -lR /tmp/set1
/tmp/set1:
total 8
-rw-rw-rw- 1 baron baron   40 2007-09-30 21:43 00_master_data.sql
drwxrwxrwx 2 baron baron 4096 2007-09-30 21:43 test

/tmp/set1/test:
total 16
-rw-rw-rw- 1 baron baron 549 2007-09-30 21:43 t1.sql.gz
-rw-rw-rw- 1 baron baron  31 2007-09-30 21:43 t1.txt.gz
-rw-rw-rw- 1 baron baron 550 2007-09-30 21:43 t2.sql.gz
-rw-rw-rw- 1 baron baron  29 2007-09-30 21:43 t2.txt.gz

And as you can see, it knows I’ve dumped those tables recently and didn’t do them again. Pretty handy for scheduling and resuming backups, no? It makes it easy to keep going if something happens in the middle of the backup and you want to restart.

I’m aware of the similar mysqlpdump script, and I generally don’t like duplicating other people’s efforts, but I decided to go ahead and finish what I’d started. To tell you the truth, neither script is complicated. It’s just a matter of providing a sensible wrapper around existing functionality (in my case, that’s mysqldump and SELECT INTO OUTFILE, which I do directly rather than asking mysqldump to do it with -T, which just makes mysqldump into the same kind of wrapper). I also wanted to provide it as part of the MySQL Toolkit, so it’s all in the same place. Frankly, I also built in a lot more functionality than mysqlpdump has, and I consider the defaults to be more useful. I’d love for mysqldump itself to have better defaults — especially for dumping large datasets, which it’s frankly pretty poor at right now. One of these days MySQL AB will make me obsolete, I just know it…

Oh, and in keeping with my tradition, it’s sort of ridiculously sophisticated and overly generic. It has a little macro language that you can use to basically turn it into a loop iterator over the selected databases and tables, and run any command you wish. Here’s an example:

mysql-parallel-dump -- mysqldump --skip-lock-tables '%D' '%N' \| gzip --fast -c - \> '%D.%N.gz'

That basically duplicates the built-in defaults (except the defaults are actually a lot more complicated than that). But it illustrates how you could use this as a shell to select which tables to dump and fork off sub-processes, handling all the locking, error checking, and so forth for them. Here I’m spawning off mysqldump, but it would be just as easy to execute a custom script.

There’s one more wish Peter and I both have, but which is impossible for right now as far as we know. That’s to do parallel SELECT INTO OUTFILE dumps for a bunch of tables in one transaction. This will not be possible until more than one connection can participate in a single transaction. Ask the MySQL guys about that one!

This script is part of MySQL Toolkit and will be released as soon as I have time. There are a few other bug fixes I want to include. In the meantime, if you’re dying to get it, you can grab it from the MySQL Toolkit subversion repository.

Written by Baron Schwartz

September 30th, 2007 at 10:20 pm