Xaprb

Stay curious!

Growth limits of open-source vis-a-vis MySQL Toolkit

with 5 comments

Si Chen wrote recently about the growth limits of open-source projects. He points out that as a project becomes larger, it gets harder to maintain. I can only agree. As the MySQL Toolkit project has grown, it’s become significantly more work to maintain, document, and enhance. (This is why I’m asking you to sponsor me for a week off my regular job to work on MySQL Table Sync, by the way. Please toss some money in the hat.)

Rewriting code so it’s testable is a major focus for me now. Some of these tools have gotten complicated enough that I can’t keep track of all the code. In other words, they’re collapsing under their own weight.

Back in the project’s humble beginnings, it seemed adequate to just copy and paste a few lines here and there; after all, these are just scripts, right? Right. So I’ll just copy a few lines of code that do command-line option parsing and help screens. Hey, it turns out that several of the tools can connect to more than one server, so simple -u, -h and -p options won’t do; so I invent a DSN-like notation that lets the tools connect to an arbitrary number of servers. Copy and paste that code, too. It’s only ten lines — no big deal. Pretty soon I find out that many of the standard Perl modules aren’t available, for a lot of people. And even when they’re available, people have old versions and can’t upgrade, so I can’t rely on basic things like the quote_identifier() function in DBI modules; time to write my own. Well, that’s only a single line! Surely that’s okay to copy and paste.

As Kurt Vonnegut says, “So it goes.” This is the death not only of quality, but of maintainability and extensibility. The Right Answer ™ is to write everything as modules, with proper test suites, and then make the scripts as minimalistic as possible — essentially gluing the modules together with a few lines of harder-to-test code. That’s how I’m used to working, too, but for some reason I can’t explain, it seemed okay not to work that way with this project. That has turned out to be a big mistake, which I’m slowly correcting out of necessity.

But it turns out it’s not that simple, either. I’ve gotten a lot of emails, phone calls from friends, and bug reports about how hard it is to install or update Perl, or get a CPAN module, on many systems. It turns out that a lot of companies are rightfully suspicious about CPAN (I have a tolerate-hate relationship with it myself), and won’t let my consultant friends install or upgrade any module without a lot of red tape. OK, you say, so bundle and distribute the modules the toolkit needs, and they can be installed locally with the toolkit. That sounds nice, but it’s even worse for a variety of reasons. Just to mention one: did you know that it can be a pain in the butt even to set @INC so a module sitting in the same directory with the script will be found by the script? (Please don’t tell me how easy it is, or I’ll let you respond to the next person trying to get it to work on an obscure platform with a Perl installation from the middle ages). Okay, I’ll mention two reasons: some Perl modules have to be compiled and customized just for the operating system you’re installing them on, or they’ll segfault (of all things)! Don’t get me wrong, I don’t think the grass is greener on the other side; no way do I want to try writing these things in C or Java. Perl is about as portable as it gets.

The net result is that I have to do a lot of little tricks to make these things standalone programs, as much as humanly possible. I’m trying to reduce dependencies on external modules, even those that are part of core Perl. I’m re-inventing functionality because it’s not available in all versions. I’m writing modules that can be tested, but I’m not shipping them as separate modules; I’m basically using sed to copy-and-paste the module’s code into the scripts.

Why am I doing all this work?

Because it’s less work than not doing it.

But it is significantly more work than just whacking together some “scripts” and uploading them. That’s why there is a critical mass beyond which it gets harder to grow a project. The solution to this is to find a way to do things differently, work smarter, not harder. The challenge is to switch the fight against the demons of bad code and maintainability so it’s on my terms. In other words, don’t fight against these characteristics of growth; make them work for me. I won’t say I’ve learned that lesson completely, but I’m starting. That’s why I’m automating basically everything about this project (though for some reason I can’t get WWW::Mechanize to stay logged into Sourceforge, so I’m having a hard time automating part of the release process).

I’m also considering ways to provide this toolkit without taking so much out of my own pocket. What started out as me developing tools for my employer, and them graciously agreeing to let me make them available for Sourceforge, has gone far beyond my employer’s needs now. I can’t ask my employer to carry the weight, so it has fallen to me for a while now. That’s okay for some period while I work out how to do it differently, but not indefinitely. Among other things, it cuts into time I want to spend with my wife. Charging for support has definitely crossed my mind, as has some kind of community/enterprise split (such as the one Zmanda does). I don’t want to go there yet — so I’m just asking for a week of sponsored time off work, to begin with.

By the way, the process of replacing copy/pasted code isn’t without its hitches. I just found and fixed a bug in MySQL Table Checksum that I caused by moving the DSN parsing code to a module. And someone else just reported a different bug in another tool, where it turns out the copy/pasted code wasn’t quite identical and I changed the functionality by moving it to the module. Release early, release often. Rely on users to find bugs and report them. So it goes.

Written by Xaprb

November 5th, 2007 at 10:50 am

5 Responses to 'Growth limits of open-source vis-a-vis MySQL Toolkit'

Subscribe to comments with RSS

  1. At the risk of sounding obvious, there are a multitude of ways to get modules to be included without too much trouble… I can only assume you didn’t like them, because they *do* exist.

    fenway

    5 Nov 07 at 2:04 pm

  2. I’m open to hearing what they are, but it’s not getting modules included that’s too much trouble in the general case. It usually works fine. It’s when someone who doesn’t know Perl well emails me saying “hi, I’m in the field with a consulting client and they have Perl 4.x running on Solaris, and I can’t get this working. I don’t know Perl. Can you help?” I’m trying to make this easy for these folks so they don’t have to ask for help. I’ve had this trouble with innotop, which has a parser module that several people have had trouble getting innotop to find. In the future innotop will be a single file, too.

    Sometimes even the basic “perl Makefile.PL; make install” routine doesn’t go smoothly (example: it doesn’t work at all on Windows). For that reason, my goal is that you should be able to untar the downloaded file and run it with NO installation or anything else necessary (it’s just a convenience for those who want it). I can’t help it that DBI and DBD::mysql are prerequisites, but I can avoid as many other prerequisites as possible.

    I gave some specific examples of what’s hard for people, but in general, what’s hard for people is if they’re expected to understand how Perl works, so I want to shield them from it as much as possible. That mindset may deserve criticism, but ultimately I think it makes it easier for people to benefit from the tools.

    Xaprb

    5 Nov 07 at 2:20 pm

  3. Agreed… it sounds like the platform/OS cross-dependency issues are the real deal-breaker here.

    fenway

    5 Nov 07 at 3:11 pm

  4. Another option is to grow the number of people hacking on the code. I suspect with decent unit testing and good regressions, you could end up with fairly stable releases.

    That said, thanks for writing all of this! It saved me quite a bit of time over this last week. If I can help in any way, please do let me know.

    Best,

    -Peter

    Peter

    8 Nov 07 at 4:55 pm

  5. Hi Peter,

    I think once I get finished ripping out all the copy/pasted code and replacing it with unit-tested code, releases will be very good quality. I am pretty decent at writing tests — I just didn’t do it previously. (again, I am not sure why).

    You can contribute to sponsoring my week off work if you’d like. I think that will be highly productive.

    Help on the code is always welcome too, but I understand not everyone wants to (or can) do that.

    Xaprb

    8 Nov 07 at 5:14 pm

Leave a Reply