Xaprb

Stay curious!

Introducing MySQL Table Sync

with 5 comments

Download MySQL Table Sync

MySQL Table Sync efficiently finds and resolves data (not structural) differences between two MySQL tables, which may be on different servers. It offers two search algorithms to find the rows that differ, and several methods to bring the destination table into sync with the source.

It’s based on my earlier discussion on how to find and resolve data differences between MySQL tables and is part of the MySQL Toolkit project.

Acknowledgments are due before anything else: thanks to Giuseppe Maxia, who laid the foundation several years ago (and released code for download too). Thanks to Fabien Coelho, who extended this work into pg_comparator and wrote a very detailed analysis of the technique (see also “Remote Comparison of Database Tables”). Fabien kindly answered some questions I had over the last weeks.

Efficient search for differences

I implemented both the top-down algorithm I proposed in my earlier post, and the bottom-up search algorithm developed by Maxia and Coelho.

The top-down search is my original work, and I implemented it nearly as proposed, except I converted it to breadth-first instead of depth-first search for technical reasons (there are also some practical advantages to breadth-first search). I also haven’t implemented all the grouping strategies I proposed. Only simple groupings on columns are implemented.

I reworked the bottom-up algorithm to add running counts, indexed modulus columns, and some other efficiency, correctness and optimization changes. However, it is about 80% based on Coelho’s algorithm, and I consulted the pg_comparator source code frequently while writing it.

Both algorithms are good for certain cases and poor for others.

Methods to resolve the differences

I implemented a variety of options to resolve the differences between the tables, including one that relies on replication, which is ideally suited for re-syncing slaves that have drifted without having to re-initialize them completely. There are also many options to control locking, waiting for the slave to catch up to the master, and so on. This part of the tool is not terribly difficult or complex, but it’s something no one has quite done before to my knowledge.

Work on this remains, however.

The tool’s present and future

The documentation is incomplete as of yet, and there are some things I still want to implement, but I wanted to get it out (release early, release often). I have used the tool in lots of tests, and have not found anything incorrect yet, but that doesn’t mean there are no bugs. I have also used it — cautiously — in production. I re-synced a table that had drifted out of sync on a slave.

If you want to try it, I encourage you to run it in --print mode to begin with, and examine carefully the queries it emits. Its output is valid SQL, so you can actually execute the output to sync the destination table.

I’m sure there will be many changes to functionality, behavior and output in the future.

I really hope you’ll try it and let me know what you think.

About MySQL Toolkit

MySQL Toolkit is a set of essential tools for MySQL users, developers and administrators. The project’s goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as awk and sed. Other tools in the toolkit include a table checksummer and a duplicate key checker.

Written by Xaprb

March 18th, 2007 at 9:01 pm

Posted in Uncategorized

5 Responses to 'Introducing MySQL Table Sync'

Subscribe to comments with RSS

  1. I have some doubt after testing your tool

    1)what to do when we are trying to sync blob as pk?

    2)There are some cases when there is no pk in tables,in that case it will replicate all source rows along with dest rows?
    as you have mentioned without pk its not worth to sync!
    still i have some cases when its required?

    3)sometimes i am getting hash ref failed for even a simple tables?

    4)I dont know perl so, how can i know when to use topdown and bottomup?

    please reply its urgent

    xart

    11 May 07 at 2:06 am

  2. Hi xart,

    I don’t understand some of your questions. Let’s discuss further on the mysqltoolkit-discuss mailing list, please. There are some other people on that list who may be interested in the same things as you.

    Xaprb

    11 May 07 at 8:54 am

  3. new name and downloadpage: http://sourceforge.net/projects/maatkit

    sqlFan

    16 Jan 08 at 5:42 am

  4. There’s also a [less ambitious more simple tool] I built in Ruby
    http://rubyforge.org/projects/table-syncer/
    Enjoy!
    -=r

    roger

    28 Dec 08 at 1:44 am

  5. Hi Xarprb, I am looking for a tool that syncs tables in mySQL. I have a Joomla website with multiple extensions. I need the data from one field copied to another field automatically. I am not that technical. Would this tool be OK? Are there better tools out there?

    Thanks for any assistance.

    Air Monitoring

    7 May 09 at 10:07 pm

Leave a Reply