<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xaprb &#187; Test Driven Development</title>
	<atom:link href="http://www.xaprb.com/blog/tag/test-driven-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xaprb.com/blog</link>
	<description>Stay curious!</description>
	<lastBuildDate>Thu, 09 Feb 2012 10:55:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A productivity tip for test-driven development</title>
		<link>http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/</link>
		<comments>http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/#comments</comments>
		<pubDate>Sun, 03 May 2009 21:35:48 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Test Driven Development]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1062</guid>
		<description><![CDATA[If you code by writing tests that fail, and then fixing the tests by writing the code, then you might find yourself switching to a terminal, running the test, ad nauseum. Part 1 of my tip is to run the test in a loop that takes a single keystroke to trigger: $ while read line; [...]


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/' rel='bookmark' title='Permanent Link: How Maatkit benefits from test-driven development'>How Maatkit benefits from test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/' rel='bookmark' title='Permanent Link: How to unit-test code that interacts with a database'>How to unit-test code that interacts with a database</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/06/planned-change-in-maatkit-aspersa-development/' rel='bookmark' title='Permanent Link: Planned change in Maatkit &#038; Aspersa development'>Planned change in Maatkit &#038; Aspersa development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/26/four-companies-to-sponsor-maatkit-development/' rel='bookmark' title='Permanent Link: Four companies to sponsor Maatkit development'>Four companies to sponsor Maatkit development</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p>If you code by writing tests that fail, and then fixing the tests by writing the code, then you might find yourself switching to a terminal, running the test, ad nauseum.  Part 1 of my tip is to run the test in a loop that takes a single keystroke to trigger:</p>

<pre>$ while read line; do clear; perl MyTestScript.t; done</pre>

<p>This works with any language, not just perl &#8212; just replace the test command with the right one.  ALT-TAB, press Enter, ALT-TAB back to your editor.</p>

<p>Part 2 of my tip is to make it really easy to drop into the debugger if you want.  Notice the small change here:</p>

<pre>$ while read line; do clear; perl $line MyTestScript.t; done</pre>

<p>Now instead of pressing Enter, you can type &#8220;-d&#8221; and press Enter.  Presto, you&#8217;re in the debugger.  This also works for any language that has a built-in debugger.  Of course, you can also pass any other arguments you want, such as enabling profiling.</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/' rel='bookmark' title='Permanent Link: How Maatkit benefits from test-driven development'>How Maatkit benefits from test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/' rel='bookmark' title='Permanent Link: How to unit-test code that interacts with a database'>How to unit-test code that interacts with a database</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/06/planned-change-in-maatkit-aspersa-development/' rel='bookmark' title='Permanent Link: Planned change in Maatkit &#038; Aspersa development'>Planned change in Maatkit &#038; Aspersa development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/26/four-companies-to-sponsor-maatkit-development/' rel='bookmark' title='Permanent Link: Four companies to sponsor Maatkit development'>Four companies to sponsor Maatkit development</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to unit-test code that interacts with a database</title>
		<link>http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/</link>
		<comments>http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 00:47:34 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Test Driven Development]]></category>
		<category><![CDATA[testing a database]]></category>
		<category><![CDATA[The Rimm Kaufman Group]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/</guid>
		<description><![CDATA[I got some interesting comments on my previous article about unit testing Maatkit, including echoes of my own conversion to the unit-testing religion. One of the objections I&#8217;ve heard a lot about unit-testing is how it&#8217;s impossible to test code that talks to a database. &#8220;It&#8217;s too hard,&#8221; they say. &#8220;Oh, it&#8217;s easy to test [...]


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2006/05/16/how-to-refactor-without-rewriting-unit-tests/' rel='bookmark' title='Permanent Link: How to write unit tests for ease of refactoring'>How to write unit tests for ease of refactoring</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/' rel='bookmark' title='Permanent Link: How Maatkit benefits from test-driven development'>How Maatkit benefits from test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/' rel='bookmark' title='Permanent Link: A productivity tip for test-driven development'>A productivity tip for test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2011/11/07/when-documentation-is-code/' rel='bookmark' title='Permanent Link: When documentation is code'>When documentation is code</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p>I got some interesting comments on my previous article about <a href="http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/">unit testing Maatkit</a>, including echoes of my own conversion to the unit-testing religion.  One of the objections I&#8217;ve heard a lot about unit-testing is how it&#8217;s impossible to test code that talks to a database.  &#8220;It&#8217;s too hard,&#8221; they say.  &#8220;Oh, it&#8217;s easy to test a module that calculates a square root, but a database?  Way too much work!&#8221;</p>

<span id="more-556"></span>

<p><strong>Note:</strong> As commenters have pointed out, I&#8217;m not necessarily using &#8220;unit&#8221; in the agreed-upon way here.  Everything I say can be applied to ultra-pure unit testing too, but I go beyond that.  I will hold fast to my assertions about mocking though *grin*</p>

<h3>Is it really impossible or even hard?</h3>

<p>I disagree.  In one of my previous articles I said <a href="http://www.rimmkaufman.com/rkgblog/">The Rimm-Kaufman Group</a>, my previous employer, has a comprehensive unit-test suite.  When I say comprehensive I mean it: database interaction is fully tested, too.  I know because I was heavily involved in building it.  Even extremely complex things like big reports that are generated from lots of data are tested.  And believe me, sharding the databases would have been much harder without complete code coverage.  It&#8217;s really not that complicated to unit-test against a database, and it&#8217;s so worth it.  Here are some hints about how you can do this.</p>

<p>There are many ways to do it, but I&#8217;ll just describe the basics of the system I helped build.  There are several moving parts to the test suite (&#8220;<a href="http://c2.com/cgi/wiki?SmokeTest">smoke</a>&#8220;), but one of them sets a magical environment variable.  And then, all code that connects to a database server magically gets back a different database connection from the create_me_a_connection() function.  This is because there is a database connection abstraction library that respects the environment variable.  It&#8217;s really pretty simple for the most part; instead of doing DBI->connect(&#8230;) you just call this function, which is a thin wrapper that hands back a connection object.</p>

<p>This wrapper is itself unit-tested thoroughly, too.  This ensures that when some code is being run from a test, it cannot (I mean cannot!) connect to a production database, and vice versa.  There are some conventions about production and test servers that make sure the abstraction library can tell for sure.  If there&#8217;s any confusion, of course, it will die in a non-recoverable way.  Safety first.</p>

<h3>Building a good development environment</h3>

<p>Just as each developer has their own copy of the code from version control, each developer has their own private database server running on the dev machine.  There are some simple conventions that make this possible: Unix user ID plus a constant for the port number, etc.  It&#8217;s really quite easy.  The private database server is a slightly modified version of <a href="https://launchpad.net/mysql-sandbox">Giuseppe Maxia&#8217;s MySQL Sandbox tool</a>.  It can be torn down and set up afresh as desired.  It is wiped clean and re-filled at the start of every test, with a small, tightly focused dataset carefully chosen to represent the conditions the code is supposed to work with.  (Each test has its own dataset).</p>

<p>If this sounds like a system that can&#8217;t work on a large scale, well, it does.  That&#8217;s the secret sauce that I won&#8217;t reveal in this post.  (It&#8217;s my past employer after all, and I can&#8217;t go revealing everything about them can I?)  You just have to be smart about it.  When a database is central to your business, you either figure out how to get this right, or you pay the consequences in lost time and poor code quality.</p>

<p>I and the other developers there (another secret: it&#8217;s a small team; <a href="http://www.craigslist.org/">small teams build great things</a>) built several quick utilities to help develop unit tests against a database.  There are utilities to get a minimal necessary dataset for testing and dump it into a file that can be loaded by the test.  There are utilities that can migrate schemas and update the tests to match the schema changes.  And so on, and so on.  This is possible because of careful planning for testability, and really smart things like super-consistent and sensible naming conventions for database objects.  (Ruby On Rails owes a lot of its success to simple things like this, too.  Conventions are really powerful.)  Maybe I&#8217;ll write about the database naming conventions some other time &#8212; I have to credit Alan Rimm-Kaufman a lot for designing those conventions.  It was a stroke of genius.</p>

<h3>Things to avoid</h3>

<p>There are several things I <em>do not</em> recommend doing when you unit-test code that talks to a database.  I&#8217;ll just mention a couple:</p>

<ul
<li>Don&#8217;t <a href="http://c2.com/cgi/wiki?MockObject">mock</a> anything!  In general I think mocking is the devil.  Most of the mock objects I&#8217;ve ever seen reflected a propensity to <a href="http://www.xaprb.com/blog/2006/05/16/how-to-refactor-without-rewriting-unit-tests/">test an implementation instead of a behavior</a>, which is also the devil.  Write all your code to test a test instance of something real, and do not mock up a database to test against.  It is a rabbit-hole that you will not emerge from easily.</li>
<li>Never let a test connect to a production database.  Never, ever.  Worlds of hurt will follow.  Not only are you risking your production data, but what about the risk to your code?  You&#8217;re testing against things that will almost certainly change and break your tests; and you&#8217;re possibly polluting your live data with testing data and/or changing live data from the tests.</li>
<li>I also recommend developing unit tests for your current database functionality if you&#8217;re thinking about changing it much.  <a href="http://dev.mysql.com/doc/en/server-sql-mode.html">Don&#8217;t like MySQL&#8217;s lax error handling?  Plan to set the SQL_MODE to something stricter?</a>  Dive into that database abstraction library and make your tests run in strict mode first by setting SQL_MODE on every new connection that&#8217;s created when running inside a test; fix all the breakage in the test suite; feel sure that your code isn&#8217;t going to break in production.  That was easy!</li>
</ul>

<h3>Summary</h3>

<p>Once your creative juices get flowing, you&#8217;ll see tons of places your unit test suite can help you out.</p>

<p>If you&#8217;re in the Oracle or SQL Server world, or any other world where you can&#8217;t just set up and discard database instances at will due to licensing problems, you&#8217;re going to have to be a little more inventive.  But you can still do it.  (Don&#8217;t you wish you&#8217;d chosen <a href="http://www.fsf.org/">Freedom</a>?)  And unit tests are just as beneficial for apps based on Oracle as they are for MySQL.</p>

<p>Have fun!  Go forth and test some more!</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2006/05/16/how-to-refactor-without-rewriting-unit-tests/' rel='bookmark' title='Permanent Link: How to write unit tests for ease of refactoring'>How to write unit tests for ease of refactoring</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/' rel='bookmark' title='Permanent Link: How Maatkit benefits from test-driven development'>How Maatkit benefits from test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/' rel='bookmark' title='Permanent Link: A productivity tip for test-driven development'>A productivity tip for test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2011/11/07/when-documentation-is-code/' rel='bookmark' title='Permanent Link: When documentation is code'>When documentation is code</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>How Maatkit benefits from test-driven development</title>
		<link>http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/</link>
		<comments>http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/#comments</comments>
		<pubDate>Mon, 18 Aug 2008 13:54:24 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Maatkit]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[CRC32]]></category>
		<category><![CDATA[Daniel Nichter]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Test Driven Development]]></category>
		<category><![CDATA[The Rimm Kaufman Group]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/</guid>
		<description><![CDATA[Over in Maatkit-land, Daniel Nichter and I practice test-first programming, AKA test-driven development. That is, we write tests for each new feature or to catch regressions on each bug we fix. And &#8212; this is crucial &#8212; we write the tests before we write the code.* The tests should initially fail, which is a validation [...]


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/' rel='bookmark' title='Permanent Link: A productivity tip for test-driven development'>A productivity tip for test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/' rel='bookmark' title='Permanent Link: How to unit-test code that interacts with a database'>How to unit-test code that interacts with a database</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/06/planned-change-in-maatkit-aspersa-development/' rel='bookmark' title='Permanent Link: Planned change in Maatkit &#038; Aspersa development'>Planned change in Maatkit &#038; Aspersa development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/26/four-companies-to-sponsor-maatkit-development/' rel='bookmark' title='Permanent Link: Four companies to sponsor Maatkit development'>Four companies to sponsor Maatkit development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p>Over in <a href="http://www.maatkit.org/">Maatkit</a>-land, <a href="http://hackmysql.com/">Daniel Nichter</a> and I practice <a href="http://en.wikipedia.org/wiki/Test-driven_development">test-first programming, AKA test-driven development</a>.  That is, we write tests for each new feature or to catch regressions on each bug we fix.  And &#8212; this is crucial &#8212; we write the tests <em>before</em> we write the code.*  The tests should initially fail, which is a validation that the new code actually works and the tests actually verify this.  If we don&#8217;t first write a failing testcase, then our code lacks a very important guarantee: &#8220;if you break this code, then the test case will tell you so.&#8221; (A test that doesn&#8217;t fail when the code fails isn&#8217;t worth writing.)</p>

<span id="more-553"></span>

<p>Most of the time when I do this, I write a test, it fails because I haven&#8217;t written any code yet, and I then go do some kind of clean-room coding.  Then I run the test and it&#8217;s busted, and I have to go back to the code and figure out why, and after a few more tries I get it working.  And then it feels great.  (That&#8217;s the other thing about test-first coding.  It&#8217;s really satisfying, like cooking the perfect dinner, arranging the plates beautifully and then eating.)</p>

<p>This time I wanted to write a pure-Perl implementation of CRC32, and embed it in mk-table-checksum.  We try really hard never to rely on external modules, even modules that ought to be distributed with Perl itself.  That keeps Maatkit as portable as possible and makes sure there is no installation hell.  You can generally just get and run the Maatkit tools with no installation.  So I referred to an existing CRC32 implementation, in <a href="http://search.cpan.org/~fays/Digest-Crc32-0.01/Crc32.pm">Digest::Crc32</a>.  I wrote a test by referring to the value I got from MySQL&#8217;s built-in CRC32:</p>

<pre>mysql> select crc32('hello world');
+----------------------+
| crc32('hello world') |
+----------------------+
|            222957957 | 
+----------------------+
1 row in set (0.00 sec)
</pre>

<p>Here&#8217;s the test:</p>

<pre>is($c-&gt;crc32('hello world'), 222957957, 'CRC32 of hello world');</pre>

<p>CRC32 is CRC32, so my code better agree with a working implementation.  And then I wrote the code, which is a refactoring of the math in the module I linked to above.  And then I ran the test, and it Just Passed with no further ado.  w00t!  This is pretty much a historic first for me!  I thought at first that I&#8217;d screwed something up with the test, but I checked again. This is like getting a hole-in-one for me :-)  So I just thought I&#8217;d share it with you.  It feels <strong>awesome</strong>.</p>

<p>If you&#8217;re not doing test-first coding, you ought to give it a try.  If you are conscientious about writing tests first, your code will always be easy to test.  If you don&#8217;t, you write untestable code.  Then it&#8217;s tough or impossible to ever get tests on it, and you spend the rest of your life wasting time on stupid bugs and slow, fearful development, never knowing what else you are breaking with your &#8220;fixes.&#8221;</p>

<p>Test-driven development is one reason <a href="http://www.rimmkaufman.com/">The Rimm-Kaufman Group&#8217;s</a> in-house bidding system blows away their competition.  (RKG is my previous employer.)  The comprehensive unit-test suite lets you know right away if you&#8217;ve broken something.  That keeps the code clean and makes it possible to be extremely productive.  I remember once when one of my co-workers there implemented a major feature in a very short time.  It was also incredibly helpful when sharding the databases (anyone ever done this without a test suite?  Would you like to share about how much of your systems broke during sharding?  It was almost a non-event at RKG).  The people I worked with before I joined RKG looked at me like an alien when I tried to explain that this was possible.</p>

<p>If you&#8217;re thinking that your code is not &#8220;that kind of code,&#8221; that &#8220;only certain kinds of code lend themselves to unit tests,&#8221; then stop. I&#8217;ve heard this before, and you&#8217;re wrong.  It&#8217;s only &#8220;untestable&#8221; because you didn&#8217;t write tests first.  Write tests first, and your code &#8212; all of it! &#8212; will be &#8220;that kind of code&#8221; that is testable.  It&#8217;s hard.  No one says it&#8217;s not; good programming is much harder than sloppy programming.  But it&#8217;s well worth it.</p>

<p>Converting untested, untestable code into tested code is not so much fun, though.  And in my experience you&#8217;ll rarely be rewarded for it, and your coworkers will not appreciate you raising the bar for them.  Maybe you need a new job.  I hear RKG is hiring.  Did I mention that their codebase is built from the ground up on unit tests?</p>

<p>* OK, we&#8217;re not perfectly disciplined about this, but we&#8217;re pretty good about it.</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2009/05/03/a-productivity-tip-for-test-driven-development/' rel='bookmark' title='Permanent Link: A productivity tip for test-driven development'>A productivity tip for test-driven development</a></li>
<li><a href='http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/' rel='bookmark' title='Permanent Link: How to unit-test code that interacts with a database'>How to unit-test code that interacts with a database</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/06/planned-change-in-maatkit-aspersa-development/' rel='bookmark' title='Permanent Link: Planned change in Maatkit &#038; Aspersa development'>Planned change in Maatkit &#038; Aspersa development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/26/four-companies-to-sponsor-maatkit-development/' rel='bookmark' title='Permanent Link: Four companies to sponsor Maatkit development'>Four companies to sponsor Maatkit development</a></li>
<li><a href='http://www.xaprb.com/blog/2007/08/24/google-test-automation-conference-day-1/' rel='bookmark' title='Permanent Link: Google Test Automation Conference, Day 1'>Google Test Automation Conference, Day 1</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2008/08/18/how-maatkit-benefits-from-test-driven-development/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Progress on Maatkit bounty, part 3</title>
		<link>http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-3/</link>
		<comments>http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-3/#comments</comments>
		<pubDate>Thu, 06 Dec 2007 11:29:44 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[checksum]]></category>
		<category><![CDATA[Giuseppe Maxia]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[synchronization]]></category>
		<category><![CDATA[Test Driven Development]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-3/</guid>
		<description><![CDATA[<p>This is the last day I'm taking off work to hack on mk-table-sync, and I thought it was time for (yet another) progress report.  Here's what I have done so far.  (Click through to the full article to read the details).</p>


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2007/12/03/progress-on-maatkit-bounty-part-2/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty, part 2'>Progress on Maatkit bounty, part 2</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/30/progress-on-maatkit-bounty/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty'>Progress on Maatkit bounty</a></li>
<li><a href='http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-4/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty, part 4'>Progress on Maatkit bounty, part 4</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/29/maatkit-bounty-begins-tomorrow/' rel='bookmark' title='Permanent Link: Maatkit bounty begins tomorrow'>Maatkit bounty begins tomorrow</a></li>
<li><a href='http://www.xaprb.com/blog/2008/02/29/how-to-sync-tables-in-master-master-mysql-replication/' rel='bookmark' title='Permanent Link: How to sync tables in master-master MySQL replication'>How to sync tables in master-master MySQL replication</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p>This is the last day I&#8217;m taking off work to hack on mk-table-sync, and I thought it was time for (yet another) progress report.  Here&#8217;s what I have done so far:</p>

<ul>
<li>All the code, except for a tiny bit of &#8220;glue&#8221; and &#8220;setup&#8221; code, is in modules.</li>
<li>Lots more tests for the modules.</li>
<li>A new sync algorithm (I still haven&#8217;t rewritten the top-down and bottom-up, which are designed for network efficiency more than MySQL efficiency, and are very complicated).  This algorithm is called &#8220;Chunk&#8221; and is based on the chunking module I&#8217;m re-using from two of the other tools.  This allows syncing the table a bit at a time to avoid locking it so much.</li>
<li>The tool chooses its own parameters, including choosing the sync algorithm automatically by examining indexes.</li>
<li>Proper exit codes, as well as several other smaller issues requested via bug reports.</li>
<li>The tool now syncs entire servers.  That is, you don&#8217;t have to specify a table.  It&#8217;ll find all the tables and just sync them.</li>
<li>The tool can sync many servers.  You give it five servers, it will treat the first as the source, and sync every table in the source to each of the four remaining servers in turn.</li>
<li>It can work via replication.  It can discover a master&#8217;s slaves via SHOW SLAVE HOSTS and sync each slave to the master.  You can also point it at a slave and it&#8217;ll discover the master, connect to it, and sync the slave to the master.</li>
<li>It integrates with mk-table-checksum&#8217;s results.  If you&#8217;ve given the &#8211;replicate option to mk-table-checksum, the slave&#8217;s results are stored in a table.  It can read that table and sync anything marked as different.  This can be combined with sync-to-master and auto-discover-slaves functionality.</li>
<li>Lots of other bugs and problems are gone simply because I&#8217;m using the modules I wrote for other tools.  This includes issues with table parsing, identifier quoting, etc etc.  As an aside, I have to roll my own for almost everything, because I can&#8217;t rely on things like DBI&#8217;s <code>quote_identifier()</code> function &#8212; it does not work in earlier versions, which are amazingly common in the real world.</li>
</ul>

<p>Whew!  So what isn&#8217;t done yet?</p>

<ul>
<li>Bi-directional syncing.</li>
<li>The Nibble sync algorithm.  It will be preferred over Chunk and can be used in more cases.</li>
<li>Documentation.</li>
<li>Full support for wide characters.  (This is non-trivial in Perl.  I need to research it.  A partial solution might not be hard, but I&#8217;m worried about the versions included in, for example, RHEL 3, which is very widely used.)</li>
<li>Updating other tools to work right with the changes to shared code.</li>
<li>Locking and transaction code.  The tool will ultimately use FOR UPDATE/LOCK IN SHARE MODE automatically on InnoDB tables instead of locking them, for example.</li>
</ul>

<p>Here&#8217;s a sample of what it can do, using a replication sandbox I set up with Giuseppe&#8217;s <a href="http://sourceforge.net/projects/mysql-sandbox">MySQL Sandbox</a>.  The sandbox contains a copy of the Sakila sample database.  I&#8217;ll just mangle a few films on the slaves:</p>

<pre>baron@kanga:~$ cd rsandbox_5_0_45/
baron@kanga:~/rsandbox_5_0_45$ ./s1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.0.45-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

slave1 [localhost] {msandbox} ((none)) &gt; update sakila.film set title='academy dinosaur2' limit 12;
Query OK, 12 rows affected, 12 warnings (0.07 sec)
Rows matched: 12  Changed: 12  Warnings: 0

slave1 [localhost] {msandbox} ((none)) &gt; Bye
baron@kanga:~/rsandbox_5_0_45$ ./s2
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.0.45-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

slave2 [localhost] {msandbox} ((none)) &gt; update sakila.film set title='academy dinosaur2' limit 1;
Query OK, 1 row affected, 1 warning (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

slave2 [localhost] {msandbox} ((none)) &gt; Bye</pre>

<p>OK, now I&#8217;ve messed up the first 12 films on one slave, and the first 1 on another.  I could just go ahead and sync them right away, but first I&#8217;ll do a table checksum to demonstrate that functionality:</p>

<pre>baron@kanga:~/rsandbox_5_0_45$ mk-table-checksum --replicate=test.checksum --port=16045 127.0.0.1 -q
</pre>

<p>And now I&#8217;ll tell the sync tool to go fix the differences the checksum revealed:</p>

<pre>
baron@kanga:~/rsandbox_5_0_45$ mk-table-sync  --replicate=test.checksum h=127.0.0.1,P=16045 -vx
# Syncing P=16046,h=127.0.0.1
# DELETE INSERT UPDATE ALGORITHM DATABASE.TABLE
#      0      0     12 Chunk     sakila.film
#      0      0      0 Chunk     sakila.film_text
# Syncing P=16047,h=127.0.0.1
# DELETE INSERT UPDATE ALGORITHM DATABASE.TABLE
#      0      0      0 Chunk     sakila.film
#      0      0      0 Chunk     sakila.film_text
baron@kanga:~/rsandbox_5_0_45$ 
</pre>

<p>Pretty easy, huh?  Take a look at the output: the first thing it did was fix the 12 films I changed.  <code>sakila.film</code> has a trigger that updates <code>sakila.film_text</code>, so that table got changed too.  The checksum tool caught this difference, but the differences were gone by the time the sync tool examined them, again due to the trigger.  On the second slave, no differences were found at all, because the changes to the first slave were made on the master, automatically fixing the second slave.  (This won&#8217;t always be the case, but it worked in this example).</p>

<p>While I&#8217;d love to continue building the perfect beast, I&#8217;m going to have to call it quits around noon today and start cleaning up, writing the documentation, and getting ready to release the code.  I&#8217;m not sure how much I&#8217;ll finish in that time.</p>

<p>By the way, anyone who wants to is welcome to get the code from the <a href="http://code.google.com/p/maatkit/">Maatkit</a> SVN repository!  I never make a big deal out of that because I generally assume people want to run released code, but SVN is there if you want it&#8230;</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2007/12/03/progress-on-maatkit-bounty-part-2/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty, part 2'>Progress on Maatkit bounty, part 2</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/30/progress-on-maatkit-bounty/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty'>Progress on Maatkit bounty</a></li>
<li><a href='http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-4/' rel='bookmark' title='Permanent Link: Progress on Maatkit bounty, part 4'>Progress on Maatkit bounty, part 4</a></li>
<li><a href='http://www.xaprb.com/blog/2007/11/29/maatkit-bounty-begins-tomorrow/' rel='bookmark' title='Permanent Link: Maatkit bounty begins tomorrow'>Maatkit bounty begins tomorrow</a></li>
<li><a href='http://www.xaprb.com/blog/2008/02/29/how-to-sync-tables-in-master-master-mysql-replication/' rel='bookmark' title='Permanent Link: How to sync tables in master-master MySQL replication'>How to sync tables in master-master MySQL replication</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2007/12/06/progress-on-maatkit-bounty-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Growth limits of open-source vis-a-vis MySQL Toolkit</title>
		<link>http://www.xaprb.com/blog/2007/11/05/growth-limits-of-open-source-vis-a-vis-mysql-toolkit/</link>
		<comments>http://www.xaprb.com/blog/2007/11/05/growth-limits-of-open-source-vis-a-vis-mysql-toolkit/#comments</comments>
		<pubDate>Mon, 05 Nov 2007 14:50:02 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[Kurt Vonnegut]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Si Chen]]></category>
		<category><![CDATA[sourceforge]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Test Driven Development]]></category>
		<category><![CDATA[unit testing]]></category>
		<category><![CDATA[Zmanda]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/2007/11/05/growth-limits-of-open-source-vis-a-vis-mysql-toolkit/</guid>
		<description><![CDATA[<p><a href="http://opensourcestrategies.blogspot.com/2007/10/limits-of-open-source.html">Si Chen wrote recently about the growth limits of open-source projects</a>.  He points out that as a project becomes larger, it gets harder to maintain.  I can only agree.  As the <a href="http://mysqltoolkit.sourceforge.net">MySQL Toolkit</a> project has grown, it's become significantly more work to maintain, document, and enhance.</p>


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/12/23/does-mysql-really-have-an-open-source-business-model/' rel='bookmark' title='Permanent Link: Does MySQL really have an open-source business model?'>Does MySQL really have an open-source business model?</a></li>
<li><a href='http://www.xaprb.com/blog/2008/05/14/mysql-free-software-but-not-open-source/' rel='bookmark' title='Permanent Link: MySQL: Free Software but not Open Source'>MySQL: Free Software but not Open Source</a></li>
<li><a href='http://www.xaprb.com/blog/2009/03/08/making-maatkit-more-open-source-one-step-at-a-time/' rel='bookmark' title='Permanent Link: Making Maatkit more Open Source one step at a time'>Making Maatkit more Open Source one step at a time</a></li>
<li><a href='http://www.xaprb.com/blog/2009/04/29/what-does-an-open-source-sales-model-look-like/' rel='bookmark' title='Permanent Link: What does an open source sales model look like?'>What does an open source sales model look like?</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/04/measuring-open-source-success-by-jobs/' rel='bookmark' title='Permanent Link: Measuring open-source success by jobs'>Measuring open-source success by jobs</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p><a href="http://opensourcestrategies.blogspot.com/2007/10/limits-of-open-source.html">Si Chen wrote recently about the growth limits of open-source projects</a>.  He points out that as a project becomes larger, it gets harder to maintain.  I can only agree.  As the <a href="http://code.google.com/p/maatkit">MySQL Toolkit</a> project has grown, it&#8217;s become significantly more work to maintain, document, and enhance.  (This is why I&#8217;m asking you to <a href="http://www.xaprb.com/blog/2007/10/31/mysql-table-sync-bounty-lets-do-it/">sponsor me for a week off my regular job to work on MySQL Table Sync</a>, by the way.  Please toss some money in the hat.)</p>

<p>Rewriting code so it&#8217;s testable is a major focus for me now.  Some of these tools have gotten complicated enough that I can&#8217;t keep track of all the code.  In other words, they&#8217;re collapsing under their own weight.</p>

<p>Back in the project&#8217;s humble beginnings, it seemed adequate to just copy and paste a few lines here and there; after all, these are just scripts, right?  Right.  So I&#8217;ll just copy a few lines of code that do command-line option parsing and help screens.  Hey, it turns out that several of the tools can connect to more than one server, so simple -u, -h and -p options won&#8217;t do; so I invent a DSN-like notation that lets the tools connect to an arbitrary number of servers.  Copy and paste that code, too.  It&#8217;s only ten lines &#8212; no big deal.  Pretty soon I find out that many of the standard Perl modules aren&#8217;t available, for a lot of people.  And even when they&#8217;re available, people have old versions and can&#8217;t upgrade, so I can&#8217;t rely on basic things like the <code>quote_identifier()</code> function in DBI modules; time to write my own.  Well, that&#8217;s only a single line!  Surely that&#8217;s okay to copy and paste.</p>

<p>As Kurt Vonnegut says, &#8220;So it goes.&#8221;  This is the death not only of quality, but of maintainability and extensibility.  The Right Answer &#8482; is to write everything as modules, with proper test suites, and then make the scripts as minimalistic as possible &#8212; essentially gluing the modules together with a few lines of harder-to-test code.  That&#8217;s how I&#8217;m used to working, too, but for some reason I can&#8217;t explain, it seemed okay not to work that way with this project.  That has turned out to be a big mistake, which I&#8217;m slowly correcting out of necessity.</p>

<p>But it turns out it&#8217;s not that simple, either.  I&#8217;ve gotten a lot of emails, phone calls from friends, and bug reports about how hard it is to install or update Perl, or get a CPAN module, on many systems.  It turns out that a lot of companies are rightfully suspicious about CPAN (I have a tolerate-hate relationship with it myself), and won&#8217;t let my consultant friends install or upgrade any module without a lot of red tape.  OK, you say, so bundle and distribute the modules the toolkit needs, and they can be installed locally with the toolkit.  That sounds nice, but it&#8217;s even <strong>worse</strong> for a variety of reasons.  Just to mention one: did you know that it can be a pain in the butt even to set <code>@INC</code> so a module <em>sitting in the same directory with the script</em> will be found by the script?  (Please don&#8217;t tell me how easy it is, or I&#8217;ll let you respond to the next person trying to get it to work on an obscure platform with a Perl installation from the middle ages).  Okay, I&#8217;ll mention two reasons: some Perl modules have to be compiled and customized just for the operating system you&#8217;re installing them on, or they&#8217;ll segfault (of all things)!  Don&#8217;t get me wrong, I don&#8217;t think the grass is greener on the other side; no way do I want to try writing these things in C or Java.  Perl is about as portable as it gets.</p>

<p>The net result is that I have to do a lot of little tricks to make these things standalone programs, as much as humanly possible.  I&#8217;m trying to reduce dependencies on external modules, even those that are part of core Perl.  I&#8217;m re-inventing functionality because it&#8217;s not available in all versions.  I&#8217;m writing modules that can be tested, but I&#8217;m not shipping them as separate modules; I&#8217;m basically using <code>sed</code> to copy-and-paste the module&#8217;s code into the scripts.</p>

<p>Why am I doing all this work?</p>

<p>Because it&#8217;s less work than not doing it.</p>

<p>But it is <em>significantly</em> more work than just whacking together some &#8220;scripts&#8221; and uploading them.  That&#8217;s why there is a critical mass beyond which it gets harder to grow a project.  The solution to this is to find a way to do things differently, work smarter, not harder.  The challenge is to switch the fight against the demons of bad code and maintainability so it&#8217;s on my terms.  In other words, don&#8217;t fight against these characteristics of growth; make them work for me.  I won&#8217;t say I&#8217;ve learned that lesson completely, but I&#8217;m starting.  That&#8217;s why I&#8217;m automating basically everything about this project (though for some reason I can&#8217;t get WWW::Mechanize to stay logged into Sourceforge, so I&#8217;m having a hard time automating part of the release process).</p>

<p>I&#8217;m also considering ways to provide this toolkit without taking so much out of my own pocket.  What started out as me developing tools for my employer, and them graciously agreeing to let me make them available for Sourceforge, has gone far beyond my employer&#8217;s needs now.  I can&#8217;t ask my employer to carry the weight, so it has fallen to me for a while now.  That&#8217;s okay for some period while I work out how to do it differently, but not indefinitely.  Among other things, it cuts into time I want to spend with my wife.  Charging for support has definitely crossed my mind, as has some kind of community/enterprise split (such as the one <a href="http://www.zmanda.com/">Zmanda</a> does).  I don&#8217;t want to go there yet &#8212; so I&#8217;m just asking for a week of sponsored time off work, to begin with.</p>

<p>By the way, the process of replacing copy/pasted code isn&#8217;t without its hitches.  I just found and fixed a bug in MySQL Table Checksum that I caused by moving the DSN parsing code to a module.  And someone else just reported a different bug in another tool, where it turns out the copy/pasted code wasn&#8217;t quite identical and I changed the functionality by moving it to the module.  Release early, release often.  Rely on users to <a href="http://code.google.com/p/maatkit/">find bugs and report them</a>.  So it goes.</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/12/23/does-mysql-really-have-an-open-source-business-model/' rel='bookmark' title='Permanent Link: Does MySQL really have an open-source business model?'>Does MySQL really have an open-source business model?</a></li>
<li><a href='http://www.xaprb.com/blog/2008/05/14/mysql-free-software-but-not-open-source/' rel='bookmark' title='Permanent Link: MySQL: Free Software but not Open Source'>MySQL: Free Software but not Open Source</a></li>
<li><a href='http://www.xaprb.com/blog/2009/03/08/making-maatkit-more-open-source-one-step-at-a-time/' rel='bookmark' title='Permanent Link: Making Maatkit more Open Source one step at a time'>Making Maatkit more Open Source one step at a time</a></li>
<li><a href='http://www.xaprb.com/blog/2009/04/29/what-does-an-open-source-sales-model-look-like/' rel='bookmark' title='Permanent Link: What does an open source sales model look like?'>What does an open source sales model look like?</a></li>
<li><a href='http://www.xaprb.com/blog/2011/07/04/measuring-open-source-success-by-jobs/' rel='bookmark' title='Permanent Link: Measuring open-source success by jobs'>Measuring open-source success by jobs</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2007/11/05/growth-limits-of-open-source-vis-a-vis-mysql-toolkit/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

