<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: An algorithm to find and resolve data differences between MySQL tables</title>
	<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/</link>
	<description>Stay curious!</description>
	<pubDate>Sun, 20 Jul 2008 22:54:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>

	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13502</link>
		<author>Xaprb</author>
		<pubDate>Tue, 09 Oct 2007 12:52:04 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13502</guid>
		<description>But by all means, explore your algorithm too!  I don't mean to say you shouldn't.  It may be a much better way.</description>
		<content:encoded><![CDATA[<p>But by all means, explore your algorithm too!  I don&#8217;t mean to say you shouldn&#8217;t.  It may be a much better way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13501</link>
		<author>Xaprb</author>
		<pubDate>Tue, 09 Oct 2007 12:50:36 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13501</guid>
		<description>Please take a look at MySQL Table Sync in the MySQL Toolkit (http:mysqltoolkit.sourceforge.net).  It may save you a lot of work.  I've implemented both algorithms there.</description>
		<content:encoded><![CDATA[<p>Please take a look at MySQL Table Sync in the MySQL Toolkit (http:mysqltoolkit.sourceforge.net).  It may save you a lot of work.  I&#8217;ve implemented both algorithms there.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Negruzzi Cristian</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13500</link>
		<author>Negruzzi Cristian</author>
		<pubDate>Tue, 09 Oct 2007 12:38:21 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-13500</guid>
		<description>Nice article. I've studied the algorithm before, but wasn't so clearly. 
Actually I met he same need for keeping my databases synchronized, and for some days I'm trying to build a suitable algorithm for that. The Top-Down algorithm is good, but, as you mentioned, is too hard to find a good grouping. And how what is to be done if there is only one indexed column as Primary Key? So I find the Bottom-Up more suitable for that purpose, but with some differences, I'm going to build a B*-Tree based on row checksums for each table, keep it saved locally (I think a XML structure is a good way) and, for saving time and traffic, do the comparison locally too. I'm not sure it's the best way for that, but i want to try.
Best regards.</description>
		<content:encoded><![CDATA[<p>Nice article. I&#8217;ve studied the algorithm before, but wasn&#8217;t so clearly.<br />
Actually I met he same need for keeping my databases synchronized, and for some days I&#8217;m trying to build a suitable algorithm for that. The Top-Down algorithm is good, but, as you mentioned, is too hard to find a good grouping. And how what is to be done if there is only one indexed column as Primary Key? So I find the Bottom-Up more suitable for that purpose, but with some differences, I&#8217;m going to build a B*-Tree based on row checksums for each table, keep it saved locally (I think a XML structure is a good way) and, for saving time and traffic, do the comparison locally too. I&#8217;m not sure it&#8217;s the best way for that, but i want to try.<br />
Best regards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chhivhorng</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-11909</link>
		<author>chhivhorng</author>
		<pubDate>Wed, 27 Jun 2007 09:57:07 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-11909</guid>
		<description>I have some problem with query sql.
I want you correct for me.
this is my query :
$queryselect="select Distinct student.id_student,first_name,last_name,promotion,class
from student,applicationrecieve where student.id_student  applicationrecieve.id_student";

the query that I write to you, I want to select data that it don't have in the table applicationrecieve from table student.
 
thanks,
regards,
chhivhorng</description>
		<content:encoded><![CDATA[<p>I have some problem with query sql.<br />
I want you correct for me.<br />
this is my query :<br />
$queryselect=&#8221;select Distinct student.id_student,first_name,last_name,promotion,class<br />
from student,applicationrecieve where student.id_student  applicationrecieve.id_student&#8221;;</p>
<p>the query that I write to you, I want to select data that it don&#8217;t have in the table applicationrecieve from table student.</p>
<p>thanks,<br />
regards,<br />
chhivhorng</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-6793</link>
		<author>Xaprb</author>
		<pubDate>Tue, 15 May 2007 11:57:22 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-6793</guid>
		<description>&lt;p&gt;Hello Fabien.  The issue with the indexing is not scans, but lookups from a child table to its parent tables, including the group-by queries.  These happen potentially many times.  I could benchmark with and without indexes fairly easily and see for sure, but after writing all the queries I'm satisfied the index is important.&lt;/p&gt;

&lt;p&gt;The WHERE clause has proven to be very important, as you guessed.&lt;/p&gt;

&lt;p&gt;I did some testing with real data, and the results are here: &lt;a href="http://www.xaprb.com/blog/2007/03/30/comparison-of-table-sync-algorithms/" rel="nofollow"&gt;Comparison of Table Sync Algorithms&lt;/a&gt;.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Hello Fabien.  The issue with the indexing is not scans, but lookups from a child table to its parent tables, including the group-by queries.  These happen potentially many times.  I could benchmark with and without indexes fairly easily and see for sure, but after writing all the queries I&#8217;m satisfied the index is important.</p>
<p>The WHERE clause has proven to be very important, as you guessed.</p>
<p>I did some testing with real data, and the results are here: <a href="http://www.xaprb.com/blog/2007/03/30/comparison-of-table-sync-algorithms/" rel="nofollow">Comparison of Table Sync Algorithms</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fabien Coelho</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-6764</link>
		<author>Fabien Coelho</author>
		<pubDate>Tue, 15 May 2007 09:06:13 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-6764</guid>
		<description>I finally read the stuff (this article and the source code). The various discussions are very interesting.

I'm not sure that I understand fully the "index" issue with the bottom-up approach. Basically each summary table is build (once) and then it is scanned just once, so having an index built on some attribute would not be amortized. The only exception may be for bulk deletes or inserts, but that should not happen.

On the "exploit append-only tables" idea, the bottom-up approach can have a "where" clause on the initial table so that the comparison is only performed on part of the data. Moreover, if the candidate tuples are somehow an identifiable fraction of the table, it might be simpler to just
download them directly for comparison, that would be a third algorithm:-)

Do you have performance figures with your tool in different settings?</description>
		<content:encoded><![CDATA[<p>I finally read the stuff (this article and the source code). The various discussions are very interesting.</p>
<p>I&#8217;m not sure that I understand fully the &#8220;index&#8221; issue with the bottom-up approach. Basically each summary table is build (once) and then it is scanned just once, so having an index built on some attribute would not be amortized. The only exception may be for bulk deletes or inserts, but that should not happen.</p>
<p>On the &#8220;exploit append-only tables&#8221; idea, the bottom-up approach can have a &#8220;where&#8221; clause on the initial table so that the comparison is only performed on part of the data. Moreover, if the candidate tuples are somehow an identifiable fraction of the table, it might be simpler to just<br />
download them directly for comparison, that would be a third algorithm:-)</p>
<p>Do you have performance figures with your tool in different settings?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4824</link>
		<author>Xaprb</author>
		<pubDate>Tue, 06 Mar 2007 12:57:43 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4824</guid>
		<description>&lt;p&gt;Rohit, if I read your comment right, you're subtly saying I'm going a good direction, which is encouraging :-)&lt;/p&gt;

&lt;p&gt;James, you've reinforced my belief that lots of people need a tool that doesn't disrupt replication.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Rohit, if I read your comment right, you&#8217;re subtly saying I&#8217;m going a good direction, which is encouraging :-)</p>
<p>James, you&#8217;ve reinforced my belief that lots of people need a tool that doesn&#8217;t disrupt replication.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James Holden</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4822</link>
		<author>James Holden</author>
		<pubDate>Tue, 06 Mar 2007 11:48:19 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4822</guid>
		<description>&lt;p&gt;Very interesting. I've already developed a simple tool to do just that, albeit based on a simple row-by-row comparison, with the correcting actions being inserts or updates directly on the slave.

It can recurse all the databases and tables to repair an entire database, or just operate on a single table.

I'd not considered doing the corrective action on the master, but it's an interesting idea.

You're absolutely correct that tables that lack a primary key are barely worth attempting, and so far my script ignores them.

One thing that I've found in practice, is that you must perform the synchronisation while replication is actually running. If you don't, you will inevitably end up replicating ahead of the normal replication process and breaking it.

I've found that this tool is useful for bringing up a new replica where it's impossible for the master to be affected in any ways, such as through table locking.

If you do develop a working implementation of your own, do let me know!

James&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Very interesting. I&#8217;ve already developed a simple tool to do just that, albeit based on a simple row-by-row comparison, with the correcting actions being inserts or updates directly on the slave.</p>
<p>It can recurse all the databases and tables to repair an entire database, or just operate on a single table.</p>
<p>I&#8217;d not considered doing the corrective action on the master, but it&#8217;s an interesting idea.</p>
<p>You&#8217;re absolutely correct that tables that lack a primary key are barely worth attempting, and so far my script ignores them.</p>
<p>One thing that I&#8217;ve found in practice, is that you must perform the synchronisation while replication is actually running. If you don&#8217;t, you will inevitably end up replicating ahead of the normal replication process and breaking it.</p>
<p>I&#8217;ve found that this tool is useful for bringing up a new replica where it&#8217;s impossible for the master to be affected in any ways, such as through table locking.</p>
<p>If you do develop a working implementation of your own, do let me know!</p>
<p>James</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rohit</title>
		<link>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4814</link>
		<author>Rohit</author>
		<pubDate>Tue, 06 Mar 2007 06:32:50 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2007/03/05/an-algorithm-to-find-and-resolve-data-differences-between-mysql-tables/#comment-4814</guid>
		<description>&lt;p&gt;Nice article!

We have a similar tool called SQLyog Job Agent which incorporates most of what you have discussed in this article. Unfortunately, it is not open-source.

We are always trying to improve the algorithm and look forward to more articles on this topic!&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Nice article!</p>
<p>We have a similar tool called SQLyog Job Agent which incorporates most of what you have discussed in this article. Unfortunately, it is not open-source.</p>
<p>We are always trying to improve the algorithm and look forward to more articles on this topic!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
