Tag Archive for 'sourceforge'

Maatkit version 2152 released

Download Maatkit

Maatkit version 2152 is ready for download. This release is also known as the “is this project really alive?” release. We thought we should delay until MySQL released a new Community Server version. Just kidding — it has nothing to do with that.

This release is also very significant in that it’s the first one that has large code contributions by someone other than myself. As you may know, Percona (my employer) has hired the very talented Daniel Nichter, author of mysqlreport and other goodies, to help with Maatkit. So far it is a match made in heaven, and Daniel did most of the coding for this release.

This is also our first release since Ask helped me move the project (thank you Ask!) to Google Code. That means you finally get a decent interface for entering issues, etc, etc. The only thing remaining on Sourceforge at this point is the online documentation, which I will probably move to maatkit.org soon. But more importantly, it means the developers have a decent interface for issues, etc etc. Sourceforge is just a bloody nightmare — their site keeps getting harder and harder to use, both as a developer and as a user. It had gotten to the point where simply adding the files to the site for download would take me hours. I tried to automate it, in true Perl fashion, but their make-a-release forms resisted my every effort. I cannot say what a relief it is to have usable project hosting that gets out of my way and lets me work. A double thanks to Ask for pushing me over the edge on this — it had been on my mind a long time. And thanks to Google, too, for a great project management interface.

Also note that the Sourceforge forums and mailing lists are dead. Google Groups is the preferred replacement.

Keep reporting those bugs and feature requests!

As you might expect, the changelog for such a long release cycle is, er, large. There’s a lot of new stuff here. I’d like to highlight the new features in mk-parallel-dump and mk-parallel-restore — which I just used to reduce a job that would have taken weeks down to mere days — and a lot of new code in mk-table-sync, as well as the up-and-coming mk-audit, which is in release-early/often mode.

Changelog for mk-archiver:

2008-08-11: version 1.0.10

   * Files downloaded directly from SVN crashed due to version information.
   * Added more information to --statistics and changed --whyquit slightly.

Changelog for mk-audit:

2008-08-11: version 0.9.1

   * Files downloaded directly from SVN crashed due to version information.
   * Added useful functionality.

Changelog for mk-deadlock-logger:

2008-08-11: version 1.0.11

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-duplicate-key-checker:

2008-08-11: version 1.1.7

   * Files downloaded directly from SVN crashed due to version information.
   * Full-text indexes were not treated specially (issue 10).

Changelog for mk-fifo-split:

2008-08-11: version 1.0.1

   * Files downloaded directly from SVN crashed due to version information.
   * Added --offset option.
   * --statistics didn't calculate lines/sec properly.
   * Removed --sleep; EOF doesn't mean anything to a non-terminal.

Changelog for mk-find:

2008-08-11: version 0.9.12

   * Files downloaded directly from SVN crashed due to version information.
   * Added --exec_dsn so you can execute SQL on a different server.

Changelog for mk-heartbeat:

2008-08-11: version 1.0.10

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-parallel-dump:

2008-08-11: version 1.0.9

   * Files downloaded directly from SVN crashed due to version information.
   * Added --progress option.
   * CHANGE MASTER TO in 00_master_data.sql used the I/O thread position.
   * Added features to permit resuming of dumps.
   * --age without --sets did the opposite of what it should (isssue 7)
   * --stopslave died after complaining the slave was not running.

Changelog for mk-parallel-restore:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.
   * Added --progress option.

Changelog for mk-query-profiler:

2008-08-11: version 1.1.11

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-show-grants:

2008-08-11: version 1.0.11

   * Files downloaded directly from SVN crashed due to version information.
   * Anonymous users were not permitted (issue 28).

Changelog for mk-slave-delay:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-find:

2008-08-11: version 1.0.2

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-move:

2008-08-11: version 0.9.2

   * The -m option was not recognized as an alias for --timeout.
   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-prefetch:

2008-08-11: version 1.0.3

   * Files downloaded directly from SVN crashed due to version information.
   * Added the --numprefix option for use in sharded data stores.
   * The Rotate binary log event type was not handled.

Changelog for mk-slave-restart:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-table-checksum:

2008-08-11: version 1.1.28

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-table-sync:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.
   * --synctomaster did not abort when unable to discover the master.
   * An error waiting for the master to catch up caused other tables to fail.
   * Added --bufferinmysql to help make GroupBy algorithm more efficient.
   * Added safety checks to prevent changing data on a slave server.
   * Added --skipslavecheck to prevent safety checks on destination server.
   * Made the GroupBy algorithm the default replacement for Stream.
   * Added the GroupBy algorithm, which can sync tables without unique keys.
   * Syncing could stop and leave a row to delete in the destination.
   * Generate command-line help from the POD.

Changelog for mk-visual-explain:

2008-08-11: version 1.0.9

   * Files downloaded directly from SVN crashed due to version information.
Technorati Tags:, , , ,

You might also like:

  1. Maatkit version 1709 released
  2. Maatkit version 1877 released
  3. Maatkit version 1508 released
  4. Maatkit version 1674 released
  5. Maatkit version 1753 released

Growth limits of open-source vis-a-vis MySQL Toolkit

Si Chen wrote recently about the growth limits of open-source projects. He points out that as a project becomes larger, it gets harder to maintain. I can only agree. As the MySQL Toolkit project has grown, it’s become significantly more work to maintain, document, and enhance. (This is why I’m asking you to sponsor me for a week off my regular job to work on MySQL Table Sync, by the way. Please toss some money in the hat.)

Rewriting code so it’s testable is a major focus for me now. Some of these tools have gotten complicated enough that I can’t keep track of all the code. In other words, they’re collapsing under their own weight.

Back in the project’s humble beginnings, it seemed adequate to just copy and paste a few lines here and there; after all, these are just scripts, right? Right. So I’ll just copy a few lines of code that do command-line option parsing and help screens. Hey, it turns out that several of the tools can connect to more than one server, so simple -u, -h and -p options won’t do; so I invent a DSN-like notation that lets the tools connect to an arbitrary number of servers. Copy and paste that code, too. It’s only ten lines — no big deal. Pretty soon I find out that many of the standard Perl modules aren’t available, for a lot of people. And even when they’re available, people have old versions and can’t upgrade, so I can’t rely on basic things like the quote_identifier() function in DBI modules; time to write my own. Well, that’s only a single line! Surely that’s okay to copy and paste.

As Kurt Vonnegut says, “So it goes.” This is the death not only of quality, but of maintainability and extensibility. The Right Answer ™ is to write everything as modules, with proper test suites, and then make the scripts as minimalistic as possible — essentially gluing the modules together with a few lines of harder-to-test code. That’s how I’m used to working, too, but for some reason I can’t explain, it seemed okay not to work that way with this project. That has turned out to be a big mistake, which I’m slowly correcting out of necessity.

But it turns out it’s not that simple, either. I’ve gotten a lot of emails, phone calls from friends, and bug reports about how hard it is to install or update Perl, or get a CPAN module, on many systems. It turns out that a lot of companies are rightfully suspicious about CPAN (I have a tolerate-hate relationship with it myself), and won’t let my consultant friends install or upgrade any module without a lot of red tape. OK, you say, so bundle and distribute the modules the toolkit needs, and they can be installed locally with the toolkit. That sounds nice, but it’s even worse for a variety of reasons. Just to mention one: did you know that it can be a pain in the butt even to set @INC so a module sitting in the same directory with the script will be found by the script? (Please don’t tell me how easy it is, or I’ll let you respond to the next person trying to get it to work on an obscure platform with a Perl installation from the middle ages). Okay, I’ll mention two reasons: some Perl modules have to be compiled and customized just for the operating system you’re installing them on, or they’ll segfault (of all things)! Don’t get me wrong, I don’t think the grass is greener on the other side; no way do I want to try writing these things in C or Java. Perl is about as portable as it gets.

The net result is that I have to do a lot of little tricks to make these things standalone programs, as much as humanly possible. I’m trying to reduce dependencies on external modules, even those that are part of core Perl. I’m re-inventing functionality because it’s not available in all versions. I’m writing modules that can be tested, but I’m not shipping them as separate modules; I’m basically using sed to copy-and-paste the module’s code into the scripts.

Why am I doing all this work?

Because it’s less work than not doing it.

But it is significantly more work than just whacking together some “scripts” and uploading them to sourceforge. That’s why there is a critical mass beyond which it gets harder to grow a project. The solution to this is to find a way to do things differently, work smarter, not harder. The challenge is to switch the fight against the demons of bad code and maintainability so it’s on my terms. In other words, don’t fight against these characteristics of growth; make them work for me. I won’t say I’ve learned that lesson completely, but I’m starting. That’s why I’m automating basically everything about this project (though for some reason I can’t get WWW::Mechanize to stay logged into Sourceforge, so I’m having a hard time automating part of the release process).

I’m also considering ways to provide this toolkit without taking so much out of my own pocket. What started out as me developing tools for my employer, and them graciously agreeing to let me make them available for Sourceforge, has gone far beyond my employer’s needs now. I can’t ask my employer to carry the weight, so it has fallen to me for a while now. That’s okay for some period while I work out how to do it differently, but not indefinitely. Among other things, it cuts into time I want to spend with my wife. Charging for support has definitely crossed my mind, as has some kind of community/enterprise split (such as the one Zmanda does). I don’t want to go there yet — so I’m just asking for a week of sponsored time off work, to begin with.

By the way, the process of replacing copy/pasted code isn’t without its hitches. I just found and fixed a bug in MySQL Table Checksum that I caused by moving the DSN parsing code to a module. And someone else just reported a different bug in another tool, where it turns out the copy/pasted code wasn’t quite identical and I changed the functionality by moving it to the module. Release early, release often. Rely on users to find bugs and report them. So it goes.

Technorati Tags:, , , , , , , ,

You might also like:

  1. Maatkit bounty begins tomorrow
  2. MySQL Toolkit needs a new name
  3. MySQL Toolkit updated

MySQL Archiver 0.9.2 released

Download MySQL Archiver

This release fixes some minor bugs and adds a plugin mechanism. Now you can extend MySQL Archiver with your own code easily. You could use this to run setup and tear-down, hook code into the archiving process, and more. Possibilities include building summary tables in a data warehouse during archiving, handling dependencies such as foreign keys before archiving each row, or applying advanced logic to determine which rows to archive.

The documentation contains full details about the plugin interface, including example code.

Technorati Tags:, , , , , ,

You might also like:

  1. MySQL Archiver can now archive each row to a different table
  2. MySQL Archiver 0.9.1 released
  3. MySQL Toolkit version 675 released
  4. Archive strategies for OLTP servers, Part 3
  5. MySQL Toolkit distribution 620 released

MySQL Archiver 0.9.1 released

Download MySQL Archiver

MySQL Archiver is the implementation of the efficient forward-only archiving and purging strategies I wrote about more than a year ago. It nibbles rows from a table, then inserts them into another table and/or writes them to a file. The object is to do this without interfering with critical online transaction-processing (OLTP) queries.

Several people have asked me to release this code, which I originally wrote for my employer. As it turns out, the delay has been fruitful. I learned a lot more about query optimization during this time, found bugs with my original approach, and got exposure to different archiving needs and techniques. As a result, this tool runs something like four to ten times faster than the code I wrote last year.

I decided to write and release it now because my employer has grown to the point we need to archive more data, faster, more flexibly. Instead of just open-sourcing the code I wrote last year, I have rewritten it from the ground up. We are using exactly the same code, and hope to benefit from community feedback and improvements.

I think the result is a good tool that does a lot of work for you:

  • It automatically writes efficient queries by inspecting table structures and indexes.
  • It handles transactions, lock timeouts and deadlocks.
  • It writes archived data to a file in the same format LOAD DATA INFILE uses by default.

It has a lot of options and functionality, so I won’t go into it too much here. I also have several ideas I want to implement in the future, but I want to see what the community thinks of what I’ve done so far before I work on it too much more.

Despite the improvements, the basic approach remains the same: it finds the first row(s), and then on subsequent queries, it continues from where it left off, rather than scanning the whole table from the start. This makes it efficient to archive in small “nibbles,” which avoids contention with OLTP queries.

I’ve put almost 30 extra-curricular hours into this recently. Most of the time has gone into making sure every different type of archiving job my employer needs to run can be generated as efficiently as possible with a minimum of fuss, such as a simple command-line option or two. I’m eager to hear what you think of it, whether it meets your needs, and how it can be improved. And I’m glad I’ve finally gotten it done after all this time!

About MySQL Toolkit

MySQL Toolkit is a set of essential tools for MySQL users, developers and administrators. The project’s goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as awk and sed.

Technorati Tags:, , , , ,

You might also like:

  1. MySQL Archiver 0.9.2 released
  2. MySQL Archiver can now archive each row to a different table
  3. MySQL Toolkit version 675 released
  4. Archive strategies for OLTP servers, Part 3
  5. How to scale writes with master-master replication in MySQL