Archive for May, 2007

Why IS NULL doesn’t always work in MySQL

How can a query like “SELECT * FROM tbl WHERE col IS NULL” return a row where the column has a (non-NULL) value? Read on to find out.

First, a demonstration

Try this one yourself:

create table test(
   a int not null auto_increment,
   b int not null,
   primary key (a)
);

insert into test(a, b) values (0, 1);

select * from test where a is null;
+---+---+
| a | b |
+---+---+
| 1 | 1 | 
+---+---+

Your reaction might be, as mine was, “what the heck is going on here?” And then you might re-try the query, just because you can’t believe your eyes, and guess what you get this time?

select * from test where a is null;
Empty set (0.00 sec)

What happened?

A lot of weird things happened here:

  1. MySQL didn’t insert 0 into the first column; it inserted 1.
  2. I asked for rows where the first column is NULL, which should return no rows, but I got a row where it is clearly not NULL.
  3. I repeated the query and got a different result.

It’s a feature, not a bug

Believe it or not, these are features, not bugs. No, really!

  1. MySQL inserted a 1 into the column because by default, inserting either a zero or nothing (e.g. omitting the column or explicitly inserting NULL) into an auto-increment column tells MySQL to generate the next value in the auto-increment sequence. In newer MySQL versions, you can set the Server SQL Mode to disable this behavior.
  2. MySQL returned the last inserted row for compatibility with some brain-dead systems. (Does this make MySQL brain-dead? Well, maybe this particular behavior is a little brain-dead). This is controlled by the SQL_AUTO_IS_NULL server variable, which according to the manual,

    If set to 1 (the default), you can find the last inserted row for a table that contains an AUTO_INCREMENT column by using the following construct:

    WHERE auto_increment_column IS NULL

    This behavior is used by some ODBC programs, such as Access.

  3. MySQL only returns the last inserted row once; the next statement will return a sane result (LAST_INSERT_ID() does not have this limitation).

Bizarre, eh? However, any system that’s really used much in the real world will end up with these kinds of idiosyncrasies to meet particular needs. Oh, the joy of trying to be compatible with Microsoft Access!

Conclusion

Know thy MySQL Manual. I only discovered these things by accident. The good news is, the manual is really, really good; it has to be one of the best pieces of documentation I know. Now, if only innotop’s manual were that good…

Technorati Tags:No Tags

You might also like:

  1. How MySQL decides to AUTO_INCREMENT a value
  2. SQL Server 2000 date and time puzzler
  3. Why NULL never compares false to anything in SQL

My unorthodox CAPTCHA blocked thousands of spam comments every week

I wrote a custom image-less CAPTCHA for my blog a while ago. I didn’t write it as a plugin, so I lost it when I upgraded Wordpress a couple weeks ago. Not having this protection was an eye-opening experience, and vindicated what I asserted in my original posts: a naive question-and-answer system is highly effective at stopping spammers, probably as effective as scrambled images. Read on for the details.

In my original article, I hypothesized that CAPTCHAs with scrambled images just make it hard for real people to use websites, and probably don’t provide any additional protection over less obnoxious methods. I thought there was probably a sweet spot at which humans don’t find the system intrusive, and yet it’s just a tiny bit too hard for most spammers to bother cracking it. After all, comment spammers are mostly targeting wide-open Wordpress installations. Why work hard at the small fraction that resist comment spam when there are so many easy targets?

(Actually, knowing what I know about search engine optimization, I’d go after the hard-to-get ones myself if I wanted quality links, but the comment spam I get is clearly about quantity, not even an attempt to look like quality).

How much spam do I get?

My little system of multiple-choice questions such as “which of the following is blue? a) sky b) grass …” seemed to cut out the vast majority of comment spam, but I never quite knew how much until I took it away and replaced it with a default installation of WordPress 2.1. In the old system, I had to delete a comment or two a day from the moderation queue. Wanna guess how much spam I built up in a week with nothing but Akismet in the new installation? From Sunday night May 13th to the next Sunday night, I got over 1,800 spam comments.

What about Akismet?

“Ah,” you say, “but that’s really no problem. You say you had Akismet installed; it should catch most of them.” Yes, but it also catches valid comments, which I value highly and don’t want to throw away. I had to pore through the spam queue and find them. If you’ve ever tried that with 1,800 comments in the spam bucket — holy cow, that’s all but impossible. I had to log into my MySQL database at the command line and start nuking them with LIKE patterns just to get it down to something manageable. Even a couple dozen spam comments a day in the spam queue would push me over the edge. If I had to deal with thousands in the spam bucket, and dozens that weren’t caught by Akismet, I’d turn off comments.

I needed a challenge question just to stop the hemorrhaging. Instead of writing my own this time, I decided to try using a pre-built plugin. I chose the popular “did you pass math?” plugin. It is, like most WordPress plugins, not perfect — but it’s good enough. I’m down to about 15 spam comments a day in the moderation queue now. With Akismet helping, that becomes quite manageable.

Notice — and this surprised me — the “did you pass math” plugin lets through more spam than my custom solution. I’d bet dollars to donuts that’s because it’s both popular and not customized per-blog. My system was unique, so it makes sense that it worked better.

So much for the naysayers

There’s a lot of “wisdom” floating around the web (some of it in the comments on my earlier posts, showing me how easy it would be to bypass my custom solution ) that says CAPTCHAs don’t work at all, and you should just use Bayesian filters and the like. I never believed it. Now I have proof. Was my system easy to break? Absolutely, and that’s why it wasn’t a hassle for real people to use. Did it work great despite its flaws? You bet.

I may re-write my solution as a plug-in at some point, if I get time. Till then, good enough is good enough, just as it always has been.

Technorati Tags:No Tags

You might also like:

  1. CAPTCHAs without images, part 2
  2. My apologies if Bad Behavior blocked you
  3. Why CAPTCHAs don’t work well
  4. How to implement CAPTCHAs without images
  5. How to install and maintain multiple WordPress blogs easily

A review of MONyog

This is a review of Webyog’s MONyog MySQL monitoring tool. The product is still in beta and has some rough edges, but holds promise. Its design includes some interesting architectural decisions I think will help distinguish it from MySQL’s own Monitoring and Advisory Service.

MONyog overview

MONyog starts an HTTP listener on your machine. After you start the program, you point your Web browser at localhost:9999. Your browser talks to MONyog, which talks to one or many MySQL servers. My impression is that MONyog is meant to be a nearly zero-footprint desktop install you’ll put on your own desktop machine, and monitor remote servers. In other words, you don’t install it on the servers you’ll monitor.

It initially prompts you for connection information, which it saves for the future. After you create and select a connection, MONyog starts watching the MySQL instance. As far as I can see, it just retrieves SHOW STATUS and SHOW VARIABLES, though I believe there are plans for more in the future.

Your web browser, in turn, communicates with MONyog via AJAX. You view the data in a dashboard through Flash graphs. If you have more than one server on your screen, you see the graphs side by side in multiple columns. Here’s a screenshot of the dashboard (nothing’s really happening because this is just watching my wife’s laptop, which is idle).

The MONyog dashboard

You can also navigate through a hierarchical display of counter data that isn’t Flash-graphed, by clicking on the “Show All” in the navigation bar. For example, if you select “InnoDB Cache,” it renders each server side by side in a table, with a bar graph if the value is a percentage. The built-in rules specify warning thresholds for values, and the top-level knows when something underneath it has a warning. In the screenshot below, you can see my InnoDB cache hit ratio is only in the 92% range, which would be worrisome if this laptop were really doing any work. Notice the red dots next to the graph, and also on the far left-hand side of the screen, indicating a problem:

MONyog all counters

The counters are generated from ordinary JavaScript expressions. Anyone who can write JavaScript can customize the built-in counters or write their own code to generate the desired data.

If you’re monitoring Linux servers, you can specify SSH login information and MONyog will also display system stats like CPU usage.

Rough edges and drawbacks

As I said, things are a bit rough around the edges in places. There is no documentation at all, despite the fact that MONyog is featured on the Webyog homepage. (In fact, this article has more information than I have seen anywhere else on the web). As of yet it doesn’t fetch or expose much variety of data from MySQL. The Linux version is RPM-only; there’s not even a .tar.gz to download, and I wasn’t able to get it working on my Linux machine. The Linux version is also very out of date compared to the Windows version, which I tried to use under Wine but couldn’t.

As far as I can tell, it is based on some open-source software such as the Netscape Portable Runtime, but is not Free Software itself. This is a definite drawback for me.

Zero-install on servers

There’s no need to install anything on the servers you’re going to monitor, which is refreshing. From what I’ve heard, the MySQL Monitoring and Advisory Service (formerly code-named Merlin) actually requires you to install an agent on each server you’re watching, as well as another server which is dedicated to collecting data from the agents. Apparently these agents are anything but lightweight and unobtrusive; I have heard a few people griping about the need to install a Java runtime and a bunch of other things (see Jan’s comment below; this is just hearsay, and I perhaps shouldn’t have written this).

In contrast, MONyog is zero-install on your MySQL servers. It simply opens a connection and requests status and other information. I think this is really appealing from a simplicity and maintenance point of view. It means you can start monitoring an unlimited number of servers without touching the configuration on any of them. All your MONyog configuration is in a single place.

I’m not picking on the MySQL folks with my comments about installing an agent. It is one of my favorite gripes. Nagios and Munin want to install agents, too. I really dislike this. In the age of SSH, there should be more agent-less monitoring systems.

JavaScript object model

MONyog populates a JavaScript object hierarchy with the data it fetches. I think people will find it easy to customize the interface, especially since you can do it from within your web browser. You just click on the file you want to edit — including any of the built-in ones — and you see the code. It’s all there; there is nothing hidden from you. If you don’t know where to get started, just take a look at the built-in counters and copy/paste to make your own.

MONyog compared to innotop

If you’re wondering how MONyog compares to innotop, I think they’re quite different. MONyog seems much more counter-oriented to me; sort of like a Flash-based version of an RRD tool. I am not sure what the future will hold for it, but in my opinion it’s a little more narrowly focused at the moment.

Disregarding the different approaches, innotop’s feature set is much larger, of course; I haven’t seen any tool that even approaches innotop’s capabilities, so that should be no surprise.

Conclusion

I continue to be interested in all sorts of monitoring systems, both for MySQL and in general, and I’m trying to learn as much about them as I can. MONyog looks like the beginnings of a nice offering, though ultimately I am unlikely to use it since it’s not Free Software. The most important advantages over other monitoring systems, in my opinion, are the simplicity and ease of use, the agent-less installation, and customizability.

Technorati Tags:No Tags

You might also like:

  1. Installing innotop on FreeBSD and Gentoo
  2. How to update a GCC profile on Gentoo
  3. A look at innotop’s new features
  4. Version 0.1.146 of innotop released
  5. How to install innotop

How to write MySQL conditional comments

MySQL’s version-specific conditional comment syntax confused me for the longest time. Then I learned about printf formatting rules, and it all became clear. Read on if you don’t already know what I mean.

First I’ll explain what I’m talking about. MySQL allows specially formatted comments with a server version number embedded in them. Servers newer than the version number will execute the commented-out code, and older servers won’t. This is really useful sometimes. For example, if you want a SQL script only to run on versions where that feature is supported, you can write it in a conditional comment. The output of mysqldump usually contains a lot of them.

They look like this: /*!32358 ... CODE ...*/;. One useful example is SHOW /*!50002 GLOBAL */ STATUS, which will show global status on all MySQL versions, instead of showing global status on older versions and session status on newer versions.

You can find examples in many places, but for a long time, I didn’t know how to write my own. I didn’t know which numbers should go where. For example, if I wanted to write something to run in 4.1.22 and greater, how should I write the version number? Is it /*!4122*/ or /*!41022*/ or /*!40122*/… I just got lost.

Once I figured it out, of course, it’s very simple. The version number always has five digits. The first number is the major version. The next two are the minor version, left-padded with zeros, and the last are the revision, also left-padded. In terms of sprintf, it looks like sprintf('%d%02d%02d', major, minor, revision). So the number I needed above was 40122.

The advantage to this formatting is that you can do a string comparison on the formatted version numbers to determine whether one is greater than the other; you don’t have to compare each part of the version number separately.

I remembered my long period of confusion recently when I was writing some conditional comments for a new tool. It felt almost too simple to write about, but maybe I’m not the only one who was in the dark for a long time!

Technorati Tags:No Tags

You might also like:

  1. Version 0.1.123 of innotop released
  2. Version 0.1.132 of innotop released

MySQL Table Sync 0.9.3 released

Download MySQL Table Sync

This release of MySQL Table Sync adds the ability to sync only part of a table, adds minor new functionality, and fixes a bug that could crash the tool.

I will be focusing some effort on this tool soon. If you would like to see particular features, please consider sponsoring them.

Technorati Tags:No Tags

You might also like:

  1. MySQL Toolkit version 946 released
  2. JavaScript number-formatting library updated
  3. MySQL Toolkit released as one package
  4. I need your advice on how to package MySQL Toolkit as one file
  5. MySQL Table Sync 0.9.2 released

Two Perl documentation time-savers

If you are a Perl developer and you only think of perldoc as the command that formats embedded POD (Plain Old Documentation) and lets you read it like a man page, you’re missing two huge time-savers this versatile command can give you.

Function documentation

Suppose you’re programming and you’ve forgotten the exact return values of localtime(). What do you do?

  • Get out your Camel book, open it to the alphabetical function reference, and hunt for localtime (it’s on page 738 and took me 15 seconds to find, but I got lucky and found the right page with only four flips).
  • Open your web browser and go to the online Perl documentation, click, click, click, there it is (took another 15 seconds, but my browser was already open).
  • Open a terminal and type man perlfunc, type /localtime, and press ‘n’ four times (8 seconds).

If you do any of the above, you’re missing a shortcut that works for any of Perl’s built-in functions:

  • Type perldoc -f localtime.

This command extracts the localtime section from the perlfunc man page and shows it to you. Elapsed time: a second or two.

Perl FAQs

If you haven’t used the Perl FAQs, you’re missing a treasure trove of answers to just about any question you might have about Perl. (Sometimes I have what I think is an obscure question, and it’s in the FAQ; I have begun to doubt that these questions are all frequently asked).

How do I prompt the user for a password? How do I write a switch statement? How do I convert from textual date-time formats back to a UNIX timestamp?

These questions and hundreds more are in the FAQ. To get to them, you can type man perlfaq, which is just an index into the detailed FAQ pages, and search for a keyword, such as “switch.” When you find it, you can scroll up a few pages and see it’s in Perl FAQ 7, General Perl Language Issues. Now you can quit this FAQ page and type man perlfaq7, then search again for “switch.”

By this time you’re probably asking “how is this a timesaver, with all those steps just to get to the answer?” It took me 15 seconds to do. Compared to a Google search and reading through things that make it sound like I’ll need to upgrade my Perl or download some CPAN module, it is a timesaver, but you can do even better.

Try typing perldoc -q switch. Oooh, is that neat or what? It searches the FAQ pages and pulls out any entries that match your term. Elapsed time: a second or two.

Conclusion

Documentation is one of Perl’s greatest strengths, but knowing how to use it well is even better.

Now if you’ll excuse me, I must go write some Perl :-)

Technorati Tags:No Tags

You might also like:

  1. Review of Perl Best Practices
  2. An alternative to canonical URIs

MySQL Table Checksum 1.1.5 released

Download MySQL Table Checksum

MySQL Table Checksum 1.1.5 adds a some useful features and fixes a couple of bugs. Now you can checksum tables in chunks, and there is an option to pause between chunks as well. This has already helped me recover a very large table that got out of sync on the slaves, and many of you have also requested this feature.

You can read the full release notes and documentation for the details, but here’s an overview:

Checksumming by chunks

This feature allows you to specify a maximum chunk size, such as one million. MySQL Table Checksum will look for the presence of a unique integer-valued column, such as an AUTO_INCREMENT primary key. If it finds such a column, it will start at the minimum of its range and checksum each range of a million together.

This doesn’t mean one million rows per chunk; it means up to a million. If there are holes in the sequence of values, of course there will be fewer than a million rows in a chunk.

The output now includes an extra column — the chunk number. This starts at 1 for every table and counts up if the table is chunk-able. This column is included in the --replicate data as well, so you will need to add another column to your checksum table if you are using --replicate.

In addition to being easier on the server, doing checksums in chunks also means it’s easier to use MySQL Table Sync to synchronize the table if you find problems with it. You can just work on the part of the table that has errors, instead of the whole table. (This code is in Subversion, but not yet released. I need to work on several other things with MySQL Table Sync as well).

I am mulling over ways to do chunking with non-numeric data, such as dates and characters. It would also be nice if it could handle multi-column keys. It seems much harder, but I may be able to find a way. I have already done a lot of work towards this end with the table-syncing algorithms I’ve designed. This functionality is just a first cut; I wanted to get feedback before adding any more. If you need to, you can manually specify a chunk column, which should let you use the functionality on tables MySQL Table Checksum would decline to chunk on its own.

Pause between chunks

If you specify the --sleep option, MySQL Table Checksum will pause and release locks between every checksum. This works on whole tables and on chunks.

Bug fixes

The most important bug fix is an optimization that was disabled due to a combination of things. The BIT_XOR strategy should be able to optimize itself and be faster now, though I don’t know exactly how much.

About MySQL Toolkit

MySQL Toolkit is a set of essential tools for MySQL users, developers and administrators. The project’s goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as awk and sed.

Technorati Tags:No Tags

You might also like:

  1. MySQL Table Checksum 1.1.6 released
  2. Progress on Maatkit bounty, part 2
  3. MySQL Toolkit version 675 released
  4. Progress on Maatkit bounty, part 3
  5. Maatkit version 1508 released

How fast is MySQL Table Checksum?

A few people have asked me how fast MySQL Table Checksum is. As with so many other things, it depends. This article shows how long it takes to checksum real data on a production server I help manage, which might give you a rough idea of how long it’ll take on your servers.

The server and workload

This server is a replication master running MySQL 5.0.38. It is a Dell Poweredge 1800 series with dual Xeon 3.4GHz processors and 2GB RAM, with three 15K SCSI hard drives in a RAID5 configuration. It serves about 40GB of data in InnoDB tables and about 25GB in MyISAM.

I can’t say too much about the workload, but I’ll tell you what I can. At the time I ran these checksums, it was running many rollup and LOAD DATA INFILE queries on the tables I checksummed. These tend to do a lot of updates, deletes, and inserts. There are also several processes that REPLACE or INSERT.. ON DUPLICATE KEY UPDATE large parts of certain tables which are in the 2-8GB range. At the same time, there are processes running rapid-fire single-row lookups and GROUP BY queries against all or part of these tables. And certain other larger-than-memory tables elsewhere in the server were being updated too, probably flushing the cache.

In other words, these results are under heavy load, and not scientific or repeatable at all (there was definitely heavier load on some of the test runs than others). But it gives you an idea. Your mileage may vary.

The results

The following table shows the number of seconds it took to checksum several heavily-used InnoDB tables with various checksum strategies: MySQL’s built-in CHECKSUM TABLE, the ACCUM strategy with replication as an INSERT/SELECT (acquires share-mode locks on the whole table for replication consistency), the ACCUM strategy as a plain SELECT without share-mode locks, and the BIT_XOR strategy. If you wish to know more about these strategies and what they do, the MySQL Table Checksum documentation explains them in great detail.

The last column shows how long it takes to run COUNT(*) on the tables in question. As you can see, taking a checksum is sometimes not that much more expensive than a simple COUNT() on InnoDB.

Rows Table Size CHECKSUM Replicate/ACCUM ACCUM BIT_XOR COUNT
49,152 317 0 0 0 0 0
1,589,248 14,472 0 1 0 1 1
100,843,520 638,144 15 12 13 25 8
332,316,672 504,652 52 56 43 77 4
2,167,996,416 4,341,475 258 303 335 541 151
318,636,032 517,406 31 10 45 92 1
1,064,960 3,105 1 1 1 0 0
2,818,048 2,369 0 1 1 1 0

Conclusion

How fast is it? On this server, I’m getting about 2GB in roughly five minutes with the ACCUM strategy, or an average of around .44GB/min, under heavy load.

In real terms, the checksum algorithms I’ve designed are within roughly an order of magnitude of the speed of COUNT(*) on InnoDB, and as the table gets larger, this difference decreases to about half, probably because of InnoDB’s caches. I think that’s the most interesting result from these tests.

It’s also interesting that my checksum algorithms are almost as fast as MySQL’s own CHECKSUM TABLE command on this data. This makes sense, given that they both essentially scan the whole table, but I didn’t expect it, given that CHECKSUM TABLE is compiled into the server.

Technorati Tags:No Tags

You might also like:

  1. Introducing MySQL Parallel Restore
  2. How to know if a MySQL slave is identical to its master
  3. Introducing MySQL Table Checksum
  4. MySQL Toolkit version 896 released
  5. How to sync tables in master-master MySQL replication

How to eliminate temporary tables in MySQL

I’ve written before about how to make MySQL replication reliable. One thing I think you need to do to make statement-based replication reliable is eliminate temporary tables. I found an elegant way to replace temporary tables with real tables in the systems I maintain. This article explains how.

The problem

Temporary tables are anathema to reliable MySQL slave servers. If you have a temporary table and the slave crashes in between accesses to the temporary table, when you restart replication the temporary table no longer exists, and you are in trouble.

This is only a problem with statement-based replication, which is how MySQL replication works until version 5.1, which is currently in beta.

If you want to be able to stop and start slave servers at will (for backups, failover, etc) or recover smoothly from crashes, in my opinion you must completely eliminate temporary tables on the master. Note that I’m talking about true temporary tables created with CREATE TEMPORARY TABLE, not temporary tables created internally by MySQL for sorting or processing subqueries.

Properties of temporary tables

Temporary tables have some nice properties:

  • They are private to a connection. No other connection can see them, and there are no naming conflicts when many connections create temporary tables with the same name.
  • They go away by themselves when the connection closes, so you don’t have to clean up after crashed connections.

These are compelling reasons to use temporary tables for many purposes. However, these very properties have drawbacks:

  • Since they’re private to a connection, you can’t debug an application easily if you need to see the data it has generated and is working on.
  • Since they don’t persist after a connection closes, they cause problems with replication, as I mentioned.

Any good solution to the problems should ideally offer some of the benefits of temporary tables. Applications need to be able to avoid naming conflicts, and cleanup needs to be easy. And it would be nice to avoid the problems too — I’d like to see a connection’s data, and I’d like to avoid replication issues. Can all this happen? I think so, if you’re willing to give up a small amount of convenience, and if there are no privacy issues.

One possible solution

One common solution I’ve seen is to use real tables, which the application will drop when it’s finished. I call these “scratch tables” to distinguish them from temporary tables.

The usual advice is to create randomly named scratch tables in a database reserved for the purpose. This is a way to avoid naming conflicts, though you still have to check before you create your table, on the off chance there is a conflict. Cleanup should usually be handled by the application itself doing a DROP TABLE when it’s done, but in case it crashes or something else happens, you can just run a periodic job to delete all tables in the scratch database that are over some age (two hours would usually suffice for my applications).

mysql-find can do the purging easily, so that’s no trouble. But there are some drawbacks to this method:

  • Randomly named tables, and checking for an existing table before creating, is a bit messy and is prone to race conditions.
  • Quick, which connection is using table d41d8cd98f00b204e9800998ecf8427e, and what is it used for? Suppose the application creates a scratch table of users, one of messages, and one of interests — what kind of table is that? I don’t know either.
  • I don’t like guessing when a table isn’t in use anymore. What if an application was running slowly because, for example, it’s talking to an API that’s responding slowly? Dropping its tables would be a bad thing to do.

My solution

I’ve solved all these problems by not naming tables randomly. Instead, I name scratch tables sensible names, just like normal tables, but I then append the connection ID to the table name. The resulting tables are named things like users_1234. This approach has some obvious benefits:

  • It’s easy to see what a table is for.
  • It’s easy to see which connection created the table.
  • The table name is guaranteed to be unique, since no two connections have the same ID (this is not true of UNIX process IDs if you have processes connecting from more than one host).
  • It’s really easy to generate the table name. Finding out your connection ID is trivial. In Perl, you can say $dbh ->{'mysql_thread_id'} on any database connection, and similar functionality exists for other programming languages. If you need to do it from within SQL, you can just use the CONNECTION_ID() function.
  • Finally, it’s really easy to know if a table is still in use.

That last point deserves a bit of explanation. You can run SHOW FULL PROCESSLIST and see a list of all current connections. If the number isn’t there, and you have the PROCESS privilege so you’re certain you can see all connections, the table is deadwood and should be pruned. I’m sure you saw this coming, but mysql-find can do all this for you, too; you just give it a regular expression pattern to capture the connection ID from the table name, and it takes care of the rest (including the privilege check):

mysql-find --pid '\D_(\d+)$' --exec_plus "DROP TABLE IF EXISTS %s"

Conclusion

What’s not to love?

  • Simple.
  • Elegant.
  • Solves world hunger.

Well, I’m still working on that last part…!

Technorati Tags:No Tags

You might also like:

  1. Temporary table subtleties in MySQL
  2. MySQL Find 0.9.0 released
  3. MySQL Toolkit version 1011 released

MySQL Find 0.9.0 released

Download MySQL Find

If you’ve used the UNIX find command for more than a trivial find-and-print, you know how powerful it is; it’s almost a miniature programming environment to find and manipulate files and directories. What if you could do the same thing with MySQL tables and databases? That was the inspiration for writing this tool. I was about to write several other tools to do some MySQL administrative jobs when I realized I could generalize and make something much more useful and powerful.

This first release has only the functionality I needed for the jobs I had to do. Initially I just implemented commands for working with tables, but the design evolved into something that can be more powerful with minimal added work. Though I modelled the tool after find, I didn’t build in all the complex expressions and conditions. Instead, I followed the general idea of having three kinds of options: regular options, tests, and actions.

The tests allow you to do things like select tables with more than a certain number of rows and with a given storage engine. There are numeric, date, and regular-expression tests.

You can apply actions to the selected tables, and as with find, the default action is to just print their names. There’s a --printf option and two ways to exec SQL commands.

What’s it for?

By way of introduction, let me explain what I was going to write special-purpose tools to do, and then show you how I can do it with mysql-find instead.

  • The first job was a tool I had tentatively titled mysql-measure-tables, which would gather selected data about all tables (data size, index size, number of rows) and store it into another table for forensics and analysis over time.
  • The second was a stale-table-sniper tool to find scratch tables and delete them when a process fails to clean up after itself. I had something like this implemented at my employer, but it also needed to throw away tables in some databases after they got a few weeks old. These are tables the analysts create and don’t delete.

Both are fairly easy to implement; really just a dozen or so lines of code. But as I began to implement the stale table sniper, it occurred to me that I should emulate find instead.

With mysql-find, now I can do these tasks and many more, very easily:

# Delete scratch tables created by processes that died
mysql-find --pid '\D_(\d+)$' scratch --exec_plus "DROP TABLE %s";

# Delete old tables created by analysts
mysql-find --mtime +30 analyst_scratch --exec_plus "DROP TABLE %s";

# Save table size and row count for monitoring over time
mysql-find --noquote --exec "INSERT INTO stat.tblsize(db, tbl, idxlen, datalen, rowcount) VALUES('%D', '%N', %I, %d, %S)";

I’ll write separately about the --pid option and how I use it. It’s a simple naming convention that makes life easy when you don’t want to use temporary tables (in my case, because of replication). If you’re curious, there are more details in the mysql-find man page.

Speaking of the man page, I hope you’ll find the documentation complete and useful. There are examples to stimulate your imagination too.

It’s not revolutionary, but it might be useful — who knows.

What’s next?

What features are upcoming, you say? Actually, I don’t have plans for any more functionality myself. This tool works for me as it is. But if you need something, hop onto the mailing lists, forums, or bug trackers at MySQL Toolkit Sourceforge page and ask. Other obvious additions would be the ability to work with databases, columns, indexes, foreign keys etc. And of course, if you find bugs, that’s the place to report them.

About MySQL Toolkit

MySQL Toolkit is a set of essential tools for MySQL users, developers and administrators. The project’s goal is to make high-quality command-line tools that follow the UNIX philosophy of doing one thing and doing it well. They are designed for scriptability and ease of processing with standard command-line utilities such as awk and sed.

Technorati Tags:No Tags

You might also like:

  1. Introducing MySQL Duplicate Key Checker
  2. MySQL Toolkit version 946 released
  3. MySQL Toolkit’s Show Grants tool 0.9.1 released
  4. MySQL Toolkit released as one package
  5. Two Perl documentation time-savers