Archive for May, 2006

How to escalate privileges in MySQL

In MySQL, privileges are stored in tables in the mysql database, so if I have UPDATE privileges to that database, it’s easy to update my own privileges. For an example of what the desired privileges should be, run the following query:

select * from user where User='root';

The Grant_priv column is what I’m interested in. I just update that value to ‘Y’ for my user, execute FLUSH PRIVILEGES, log out, and log back in. I’m all-powerful! I can grant myself any privileges I want, grant other people privileges, and so on. Of course, I could do that before by updating the system tables anyway, but now I can do it more conveniently with the built-in administration commands.

This isn’t just a weakness in the design of MySQL; Microsoft SQL Server 2000 also stores privileges in system tables (one of the reasons SQL injection attacks are so easy on an improperly configured instance of SQL Server 2000).

The real point I want to make in this article is it’s a bad idea to run the following query, so often recommended in how-to articles and books:

grant all on *.* to 'xaprb'@'%' identified by 'password';

If someone tells you that’s the way to add a user, don’t listen! You need to specify the databases and/or tables to which the user should have access. Otherwise, the user will have UPDATE permission in the system tables, and is all-powerful.

Technorati Tags:No Tags

You might also like:

  1. MySQL Toolkit’s Show Grants tool 0.9.1 released

CAPTCHAs without images, part 2

I want to follow up on my earlier article about how to do CAPTCHAs without images, for accessibility and usability. In that article I hoped my simple scheme would deter dumb robots, cutting down on the bulk of the comment spam I was getting.

It did cut out the comment spam — for a while. After a while, though, I started to get some spam again. I could see the spammer was either doing it manually, or had figured out that my form submission included the ID of the question I’d asked (wow, you spammers sure are smart). I tried changing the questions once or twice, thinking they might be automated set-and-forget spambots that would not get updated for a while. This seemed to have no effect at all. “Alas,” I thought, “I’ll just put up with a few spam comments every now and then.” Then it became 10 a day. That bothered me, but still not enough to do anything about it.

But then some @$$ started really slamming me. It was all about buying medications and online poker and so forth. It’s funny how easy it is to detect which spammer is which by the message style, too. The phrasing is always the same. I could tell this one was new. And I was getting hundreds of spam comments a day.

I tried a Bayesian filter plugin for Wordpress briefly, but it didn’t work quite right and I didn’t have enough time to learn about Wordpress’s plugin architecture to fix it. During that trial, comments were totally disabled on the blog. I couldn’t let that continue, so I uninstalled the plugin and kept moderating while I hoped for a few spare minutes to find a fix.

Meanwhile, even my posts about not using image CAPTCHAs were getting slain. Oh, the irony!

Finally, I managed to find a half-hour to tweak my image-less CAPTCHA system. Instead of posting back which question was on the comment form, I made it set up a session and store which question, like traditional CAPTCHA systems. I really didn’t want a heavy-weight solution where I stored the information in an actual session or in a relational database. I wanted it to be just enough to deter spammers, as before. So this time I used some encryption, some randomization, and a known bit of data that changes frequently — though I won’t say what that is — to generate a passkey and put it in a cookie. The cookie is valid only for one request, and is time-sensitive too. And since the secret changes frequently, hopefully it’s not obvious how this all works (though, as before, it wouldn’t be too hard to figure it out if you approach it from a “what would be easiest for him to do” point of view).

Basically, I went from a stateless and easily hackable system to one with a bare minimum of statefulness, and I guess it was just enough to foil the spammers. I haven’t gotten a single spam comment since. This is like being in an airplane and putting noise-blocking headphones on. It is blessed, blessed relief. And I’m really happy because I seem to have found — at least temporarily — just the right balance between people and robots.

Now, the only thing left is to wait and see how long it takes the spammers this time. I promise, I will fight image CAPTCHAs till the last resort is exhausted. Who knows how it will work out? I’ll certainly let you know!

Technorati Tags:No Tags

You might also like:

  1. My unorthodox CAPTCHA blocked thousands of spam comments every week
  2. Why CAPTCHAs don’t work well
  3. How to implement CAPTCHAs without images
  4. My apologies if Bad Behavior blocked you
  5. How to guard your privacy with blacklists and whitelists

Xaprb is featured on 9rules

The 9rules network, a trusted filter that chooses the best content on the web, has named Xaprb as a featured blog. This means a lot to me. 9rules is a great place to go and find quality content on an Internet that’s drowning in garbage, so being featured there is no small matter.

This is a good time for to make a plug for 9rules, too. One great way to stay up-to-date on the best of the web is browse the communities and find one that has interesting content, then subscribe to the aggregated feed.

If you’re not yet subscribed to Xaprb’s content via feeds or email, give that a try too :-)

Finally, thanks for the very kind words, Kyle!

Technorati Tags:No Tags

You might also like:

  1. Xaprb joins 9rules
  2. Simple and complex types in XML Schema

How to simulate FULL OUTER JOIN in MySQL

In this article I’ll show several ways to emulate a FULL OUTER join on a RDBMS that doesn’t support it, as is the case with even the most recent versions of MySQL. This useful query is surprisingly tricky to get right.

Introduction

A standard SQL FULL OUTER join is like a LEFT or RIGHT join, except that it includes all rows from both tables, matching them where possible and filling in with NULLs where there is no match. I’ll illustrate that for clarity. Here are two of my favorite tables, apples and oranges:

I’ll join them on price. Here is the left join:

select * from apples as a
    left outer join oranges as o on a.price = o.price

And the right join:

select * from apples as a
    right outer join oranges as o on a.price = o.price

The FULL OUTER JOIN of these two tables, on price, should give the following result:

That’s the result I’ll be working toward in this article. Here is a script to create and populate the example tables, so you can follow along:

create table apples (variety char(10) not null primary key, price int not null);
create table oranges (variety char(10) not null primary key, price int not null);
insert into apples(variety, price) values('Fuji',5),('Gala',6);
insert into oranges(variety, price) values('Valencia',4),('Navel',5);

Method 1: two JOINs and a UNION

One method to simulate a full join is to take the union of two outer joins, for example,

select * from apples as a
    left outer join oranges as o on a.price = o.price
union
select * from apples as a
    right outer join oranges as o on a.price = o.price

This gives the desired results in this case, but it isn’t correct for all cases. Suppose there are duplicate records in the tables (remove the primary key and insert twice to create this situation). UNION eliminates duplicates, which a full join doesn’t do. UNION ALL isn’t the right answer either, because it will cause spurious duplicates. In fact, UNION generates two independent result sets and then combines them, so there is no way to get around this, because the two result sets need to “know about each other” to produce the right results.

There are legitimate cases where duplicate results are expected and correct. For instance, even when the rows are unique, selecting only certain columns, in which there are duplicates, could cause this situation. This doesn’t apply in relational theory, because a set never has duplicates no matter what, but it does in SQL.

Method 2: UNION ALL and an exclusion join

One way to make UNION include only the duplicates I want is to use an exclusion join to eliminate anything from the second result that is already included in the first, like this:

select * from apples as a
   left outer join oranges as o on a.price = o.price
union all
select * from apples as a
   right outer join oranges as o on a.price = o.price
where a.price is null;

This handles duplicate rows correctly and doesn’t include anything it shouldn’t. It’s necessary to use UNION ALL instead of plain UNION, which would eliminate the duplicates I want to keep. This may be significantly more efficient on large result sets, since there’s no need to sort and remove duplicates.

Method 3: use a mutex table

There’s a case where UNION won’t work: older versions of MySQL don’t support it. All is not lost, though.

I’ve written several articles explaining how to start with a set of mutually exclusive numbers (which I informally call a “mutex table“), then use the mutual exclusivity of the numbers to join things together in interesting ways. This lets me simulate subqueries and unions on earlier versions of MySQL, for example. This approach seems like it might work well here, too. For the following queries I’ll assume my mutex table has the values 0 and 1. Here’s a baseline mutex query for these two tables:

select * from mutex
    left outer join apples as a on i = 0
    left outer join oranges as o on i = 1;

Of course this isn’t a full join. If I want to full join on price, naturally, I need to include price in the join criteria somewhere, and the query above doesn’t even mention the price. But it’s a starting point for tinkering.

The mutex values in the leftmost column, combined with the join criteria, ensure that every row in the two tables gets included on its own row. The mutual exclusivity causes the Navel row not to be matched to the Fuji row, even though they have the same price. The correct behavior of a full join on price is to “fill in” the NULL values where the prices are equal. This modification to the join criteria will fill it in:

select * from mutex
    left outer join apples as a on i = 0
    left outer join oranges as o on i = 1 or a.price = o.price;

The or a.price = o.price relaxes the mutual exclusivity, telling the join to keep the rows separated unless they have the same price:

That’s getting closer. There is a spurious row, though. The Navel row at the bottom of the result set shouldn’t be there; it has already been matched to the Fuji row earlier, so there’s no need to include it with all those NULLs as though there were no matching row in apples. Can I eliminate the Navel row without eliminating the Valencia row?

That turns out to be harder to do. I stared at it for a while, thinking I could include a WHERE clause that would eliminate spurious rows based on the value of i, but after a bit I got a reality check: the row has already been included above, and WHERE clauses work a row at a time, so there’s no way to assert something about one row while applying the WHERE clause to another row. This simple fact is all I needed to realize there’s no way to eliminate the Navel row with the given information.

What I can do, though, is stack another copy of the apples table onto the right-hand side of the results thus far, matching them to the oranges values and confining them to rows with mutex value 1 instead of 0. Now I can write a WHERE clause to see if a row in the i = 1 part of the result set matches a row in the i = 0 part. I’ll write it without the WHERE clause to start:

select * from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price;

Now there is a way to tell between the Navel row, which I don’t want, and the Valencia, which I do: the Navel has non-NULL values in the rightmost copy of apples, but the Valencia doesn’t. All I have to do is eliminate rows that have matching values:

select * from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price
where o.price is null or a2.price is null

If the WHERE clause is hard to understand, perhaps it’s easier to think of it this way: where not(o.price is not null and a2.price is not null). Both clauses are identical; all I did was apply some boolean identities. Here is the result:

That result has the correct rows, but it has some extra columns, which I don’t need. Here’s the final query:

select a.*, o.* from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price
where o.price is null or a2.price is null

Remember, this can’t be done without the mutex table, because I need something to provide non-NULL values for every row; otherwise the joins would not include values where the leftmost table has no rows to contribute.

This technique works if there are duplicate rows, and works on older versions of MySQL, but is probably the least efficient of the three I’ve demonstrated here. As usual, which query is appropriate depends on circumstances.

Technorati Tags:No Tags

You might also like:

  1. How to write a SQL exclusion join
  2. How to understand SQL joins
  3. How to write a UNION in SQL without using UNION
  4. Mutex tables in SQL
  5. How to write multi-table, cross-database deletes with aliases in MySQL

How to use Linux’s CONFIG_IKCONFIG_PROC feature

This article gives an overview of the Linux kernel’s CONFIG_IKCONFIG_PROC feature, which stores in the kernel the config file used to build it.

What it is

Linux can store a gzip-compressed kernel configuration file in the kernel itself, and make it available when the kernel is booted. This is very useful for determining how the kernel was configured when it was built. It makes it possible to re-create a kernel without having the configuration saved separately, makes it easier to debug problems with a kernel, and so forth.

How to enable this feature

There are two steps to making the kernel config available in the kernel when the system is booted. First, select the option to store the configuration file; second, select the option to make it available as the file /proc/config.gz.

When using the menuconfig system to edit a kernel configuration file, say yes to

File systems  --->
   Pseudo filesystems  --->
   [*] /proc file system support

That enables the /proc virtual filesystem; read the help file for more on that. Then enable the following:

General setup  --->
   [*] Kernel .config support
   [*]   Enable access to .config through /proc/config.gz

When editing the file by hand, say Y to CONFIG_PROC_FS, CONFIG_IKCONFIG, and CONFIG_IKCONFIG_PROC.

How to use it

If I have a working kernel but lost the config file I used to build it, I can boot the kernel and uncompress /proc/config.gz:

zcat /proc/config.gz > somefile

One place I’ve found this helpful is when building a kernel on a system I don’t know well. I can boot a live CD, for example the Gentoo live CD, and steal its configuration as a starting point. This doesn’t always work perfectly, but it’s easier than starting from scratch, in my opinion.

By the way, in this and all articles on this blog, I draw a distinction between Linux and the operating system. Linux is a kernel, not an operating system. GNU utilities constitute the rest of the operating system, which is why I use the term GNU/Linux when referring to the operating system as a whole.

Technorati Tags:No Tags

You might also like:

  1. What to do when innotop crashes
  2. Henceforth, I dub thee GLAMP
  3. How to update a GCC profile on Gentoo
  4. innotop 1.3.5 released
  5. How to monitor server load on GNU/Linux

How to auto-mount removable devices in GNU/Linux

Most operating systems have an auto-mount feature to detect and mount devices, such as USB flash drives, when they’re plugged in. GNU/Linux is no exception. It’s easy to install and just as easy to use. In this article I’ll show you how to get auto-mounting without changing /etc/fstab, writing udev rules, or installing a huge desktop environment such as Gnome or KDE.

There’s more than one way to do it

There are lots of ways to get things auto-mounted. I’ve browsed around the Gentoo forums, and I see dozens of ways people do it. Some people write udev rules to recognize devices. Others write custom shell scripts, tinker with /etc/fstab, or do a combination of lots of things. Probably the most common methods, though, are to use the built-in functionality that comes with the KDE or Gnome desktop environments — or to use the software they use, but not use the desktop environment itself.

Gnome uses gnome-volume-manager and gnome-volume-properties to auto-mount devices, and when appropriate runs a program, such as a gphoto2 in the case of a digital camera. I’m not sure exactly how KDE does automounting, since I haven’t used it in many years. What I am fairly sure of is that they both use the same programs to get their work done under the hood. These programs are hal (the Hardware Abstraction Layer), dbus (a system message bus), and udev (userspace device filesystems).

In my opinion, some methods are definitely easier than others. I suggest you either install Gnome or KDE, or if you’re a minimalist like myself, just install a system that doesn’t require a huge set of dependencies.

Using Gnome

If you’re running Gnome, or if you have it installed but run another desktop environment, such as XFCE, you need gnome-volume-manager to be running. Gnome usually starts it automatically. If you use another desktop environment, you need to start it so it runs in the background. For example, you could put this in your .xinitrcfile:

gnome-volume-manager &
startxfce4

That starts it running in the background, then starts XFCE.

Once you’ve got it running, you can run gnome-volume-properties to configure what it should do when it detects and mounts a volume. It can run programs, open a file browser, and so forth.

My favorite method

I prefer not to use gnome-volume-manager. For one thing, it requires an X server to be running, so it won’t work if I don’t want to start a windowing environment. Sometimes I just want to log in and work at the command line. It also requires me to install tons of other “stuff” I just don’t really need. Don’t get me wrong, I think Gnome is nice. I just think less is more.

So, I prefer to run software that does nothing but sit in the background waiting to be notified that something has been plugged in. The ivman program does that very nicely, and it’s very small.

For this to work, it needs the same things Gnome relies on: hal, dbus, and udev. All three need to be running as daemons. If you’re using Gentoo, you almost certainly have udev already, since it’s been part of the standard installation instructions for years. Other distributions should have udev as well. Installing hal through your distribution’s package management system should install dbus as a dependency. Likewise, starting the hald daemon should start dbus as a dependency. In Gentoo, installing ivman will install these as dependencies, so it’s as easy as

tigger ~ # emerge ivman
tigger ~ # rc-update add hald default

Now all the software I need is installed, and hald and dbus will start when the computer boots. All that remains is to give myself the proper permissions, and start ivman. As root, I can run vigr or use usermod to make myself a member of the plugdev group. I need to log out and back in for this to take effect. To start ivman automatically, I can add it to the end of my .bashrc file:

ivman --nofork > /dev/null 2>&1 &

Now I’m done. When I plug a device in, ivman finds it and mounts it (actually, it can do a lot more than just mount it; read the man page). I have the permissions I need to change files on the device and unmount it again. It appears under /media. If I’m running a file manager such as Nautilus, it shows up on the desktop and in the left-hand pane of the browser view, and I can unmount it with a right-click.

I’m running ivman as myself, not system-wide. There are security and convenience reasons for doing so. The man page explains more about it. It can also automatically run itself as a daemon in the background, but I’m choosing not to do that so a) it quits when I log out for security, and b) I don’t get multiple instances running when I log in and out repeatedly.

Summary

I recommend one of two methods:

  1. Use Gnome or KDE
  2. Install ivman, make yourself a member of the plugdev group, and start ivman from your .bashrc

So that’s it, it just works. I hope this helps you avoid a bunch of hacking, udev rules, and shell scripts.

Technorati Tags:No Tags

You might also like:

  1. How to convert text to columns in OpenOffice.org Calc
  2. How to monitor server load on GNU/Linux

Why NULL never compares false to anything in SQL

One of the most common questions SQL beginners have is why NULL values “don’t work right” in WHERE clauses. In this article I’ll explain it in a way I hope will make sense and be easy to remember.

If you’re new to SQL and have a hard time understanding this article, I encourage you to keep puzzling over it until the light comes on. I had to do that myself (and I’ve had to think hard about it to write this article), and I’ve seen a number of people learn SQL. NULLs always seem to be an important sticking point.

The query that won’t work right

Here are two common queries that just don’t work:

select * from table where column = null;

select * from table where column <> null;

They both return no rows! Countless SQL veterans have tried to explain this one to beginners. The beginner usually thinks the first row should return rows where c1 is NULL. The veteran then points out that NULL is never equal to anything. The beginner then thinks, “if NULL isn’t equal to anything, then ‘WHERE COLUMN IS NOT EQUAL TO NULL‘ is always true, so the second query should return all results!” The second WHERE clause is the logical opposite of the first, right? Right? Sadly, no it’s not.

The real problem: a language trap

The beginner has fallen into a language trap, which the experienced programmer probably set by saying “NULL is never equal to anything.” That statement seems to imply “NULL is NOT EQUAL TO.” Unfortunately, that’s wrong. Not only is NULL not equal to anything, it’s also not unequal to anything. This is where the language is confusing.

The truth is, saying anything with the words “equal” or “not equal” is a trap when discussing NULLs, because there is no concept of equality or inequality, greater than or less than with NULLs. Instead, one can only say “is” or “is not” (without the word “equal”) when discussing NULLs.

The right way to think about NULL

The correct way to understand NULL is that it is not a value. Not “this is a NULL value” but “this NULL is not a value.” Everything either is a value, or it isn’t. When something is a value, it is “1,” or “hello,” or “green,” or “$5.00″ etc — but when something isn’t a value, it just isn’t anything at all. SQL represents “this has no value” by the special non-value NULL. When someone says “the NULL value,” one should mentally disagree, because there’s no such thing. NULL is the complete, total absence of any value whatsoever.

What do you get when you compare a value to NULL?

Short answer: NULL. Every time. The result of comparing anything to NULL, even itself, is always, always NULL. A comparison to NULL is never true or false. Since NULL can never be equal to any value, it can never be unequal, either.

Sometimes people have difficulty understanding why a comparison to NULL can never be either true or false. Here’s an informal proof that may help:

Given the following predicates,

  1. NULL is not a value
  2. No value can ever be equal to a non-value

Here’s the proof by contradiction: Pretend for a moment that NULL is unequal to a value — say a real number, excluding infinity and negative infinity. I’ll choose an example number, say 5.

  1. Assume that NULL <> 5.
  2. That is, NULL <> 5 is a true expression (comparison operations are boolean, true or false).
  3. That means “NULL < 5 or NULL > 5” is true, since I’m dealing with finite, real numbers; if it’s not equal, it must be bigger or smaller.
  4. Therefore, there exists a real number equal to NULL; it’s either less than 5 or greater than 5.
  5. That’s a contradiction, because I took it as a given that no value can be equal to NULL.

Therefore NULL is neither equal to a value nor unequal to it, so any comparison involving NULL is neither true nor false. The result of a comparison involving NULL is not a boolean value — it is a non-value. You just can’t compare something that exists with something that doesn’t exist.

It has to be this way, because if a comparison to a non-value had a defined value, every query could be rewritten to return a wrong result. It would be possible to transform expressions to equivalent expressions that gave the opposite answer, and so on.

The correct way to write the queries

Instead of using boolean comparison operators such as less-than and greater-than, equal-to and not-equal-to, these queries must be written with the special comparison operator IS NULL:

select * from table where column is null;

select * from table where column is not null;

The IS NULL operator tests whether a value is null or not null, and returns a boolean.

The truth is, I lied

I’m trying to write this article to help people understand how non-values work in queries, so I’m being generous with the truth.

Since computers only work with things that exist, non-existence isn’t really possible, so NULLs must internally be implemented as some value, somewhere — even if it’s a value that indicates another value isn’t a value (huh?).

I’m glossing over something about comparisons to NULL, too. NULLs result in tri-valued logic; booleans are no longer just TRUE and FALSE, but can be UNKNOWN, too. The result of comparing NULLs is UNKNOWN, which is not the same thing as NULL, but that’s just semantic differences and deep mathematical pondering, and doesn’t materially affect how you write queries.

MySQL, for example, implements UNKNOWN as NULL, though it it isn’t perfectly consistent about it — try these queries:

select unknown;
select null;
select true;
select false;
select null is unknown;
select false is null;
select true is null;
select unknown is null;

Just remember NULL is neither equal nor unequal to anything, and I promise you will always be safe. It’s no use to get really picky about the fine points of NULL versus UNKNOWN and all that.

A puzzler with COUNT

Someone posted a comment on the MySQL manual page about extensions to the GROUP BY clause, and I think it’s interesting to discuss here. The query is a way to count subsets within a group:

select shoeStyle,
   count(color) as Count,
   count(color = 'red' OR NULL) as redCount,
   count(color = 'green' OR NULL) as greenCount,
   count(color = 'blue' OR NULL) as blueCount
from bowlingShoes
group by shoeStyle;

The comment’s author said “OR NULL is necessary, or you will just get a count of all rows in the group.” Why is this?

If the OR NULL is omitted, the result of the expression is a boolean, TRUE or FALSE, which are actual values. The COUNT function counts any value that exists, not whether something is TRUE or FALSE, so the query is behaving correctly.

On the other hand, the result of the expression color = 'green' OR NULL is either TRUE or NULL. Boolean expressions are short-circuited when they’re evaluated. As soon as the first sub-expression in a logical OR expression is true, the whole result is true, so when the color is green, the expression is TRUE immediately — a COUNT-able value. If the color isn’t green, the expression becomes FALSE OR NULL, which is NULL, of course — not a COUNT-able value.

You can see this in action with the following queries:

mysql> select true or null;
+--------------+
| true or null |
+--------------+
| 1            |
+--------------+
1 row in set (0.00 sec)

mysql> select false or null;
+---------------+
| false or null |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)
Technorati Tags:No Tags

You might also like:

  1. Type conversion semantics of MySQL’s BETWEEN operator
  2. How to avoid an extra index scan in MySQL
  3. How to simulate the GROUP_CONCAT function
  4. How to simulate optional parameters in SQL
  5. How to find duplicate rows with SQL

How to write unit tests for ease of refactoring

Unit testing can help make code much more flexible by giving an unambigous, yes-or-no answer about whether the code is right. In this article I’ll explain an important principle for making unit tests most useful for refactoring: Test the behavior, not the implementation.

This may not seem important at first glance, but consider what happens when the implementation is tested. If the implementation changes, the test suite breaks! That makes correct refactoring impossible. Remember, to refactor correctly, the implementation must pass exactly the same tests after the refactoring.

Of course, in the real world, the test suite may not be absolutely complete and correct; the refactoring may reveal bugs the test suite didn’t catch, and if there are such bugs, that’s a good thing. So “don’t touch the tests” is not an ironclad rule, but it’s a good guiding principle for a disciplined approach.

Tesing only the behavior is a lofty goal, but in my opinion one worth striving towards. At my previous employer I wrote a .NET implementation of a special-purpose template language — essentially a re-code of XSLT. I had a very thorough test suite for the whole system, and was able to refactor from regular expression parsing to a state machine — a totally different strategy (well, okay, regexes are state machines too) without changing a single test, in an afternoon — without a single bug. That was a pretty complex language; it took me weeks to write the grammar and implement the first time, with help from two other people! It was one of my first experiences with real, hardcore unit testing, and I’ll never forget it.

It also taught me the valuable lesson I’m espousing in this article. It was partly because I was using a strongly typed language, and a highly cohesive, loosely coupled object-oriented design, that my tests were all about behavior and not implementation, but it was partly luck, too. In fact, I didn’t realize the distinction until I thought about refactoring another system at my present employer, where the unit tests check the type of the objects the system handles — making it impossible to re-implement with primitive types instead of user-defined types, unless I change the tests.

I think there’s a natural tendency to test the implementation as a coding assist. For instance, in a weakly typed language, it can be tough to make sure a variable has the desired type. I understand the desire to put it in the unit tests, and I am not sure where else it belongs, but I really think it doesn’t belong there. It makes the code harder to change, which defeats the purpose of unit testing.

Technorati Tags:No Tags

You might also like:

  1. JavaScript number-formatting library updated
  2. Temporary table subtleties in MySQL
  3. A PHP implementation of the XML DOM
  4. JavaScript date formatting benchmarks

JavaScript date formatting benchmarks

Two earlier articles demonstrated how to format and parse dates flexibly with JavaScript. I asserted in those articles that my approach was efficient, though I didn’t provide any numbers to prove my claim. This article compares the performance of my date-formatting library against several other date-formatting libraries I’ve found online.

I’d like to benchmark my date-parsing library too, but I haven’t seen any comparable implementations. By the way, my date-formatting and date-parsing libraries are wrapped into a single file, so even though I’m not actually executing the date-parsing functions in this benchmark, they’re compiled anyway.

Motivation

My motivation for this article is simply to demonstrate the truth of what I said earlier: writing code to write code can be very efficient. I’m explicitly not trying to say I’m “better” than anyone, or someone else does anything “wrong” or anything like that. This series of articles really isn’t even about date manipulation! It’s about demonstrating a programming technique: write a function that can write other functions, making it possible to optimize code whose behavior depends on input that is not known until runtime. That’s why my first date-formatting article was originally titled “JavaScript closures for runtime efficiency.”

Note that my motivation also isn’t to make the fastest date formatting code. I haven’t really tried to optimize for speed, but if I wanted to, I think I could probably find some bottlenecks and optimize my own code further.

All I’m trying to do is demonstrate the general coding methodology I used, because I often see folks using a much less optimal solution, probably because they don’t know about (or aren’t comfortable with) dynamic code generation. That’s just my opinion.

Setup

I ran these benchmarks on Firefox 1.5.0.2 on my home computer, a fairly new and powerful AMD64 machine running Gentoo GNU/Linux. I won’t bother telling you all the hardware specs… that always makes my eyes glaze over.

I created a set of pages that do nothing but include the JavaScript files needed, and run 100,000 iterations of date-formatting. I closed my browser window between each test, and nothing else was running on my machine. I ran each test several times and averaged the results, rounding to four significant digits.

I had to increase the script timeout so Firefox wouldn’t interrupt the tests. I did this by opening about:config, then changing dom.max_script_run_time to 5000 seconds.

Results

Here’s a graph of the times. The algorithms are in alphabetical order:

JavaScript date-formatting benchmark

Obviously, the method I use is much faster — between 3.15 and 4.89 times faster. Here are the results as numbers in a table. WARNING: If you click on the links to the benchmarks, your browser will probably freeze for the better part of a minute on a fast machine — maybe longer on a slow machine.

Is this an apples-to-apples comparison?

Absolutely not, and if it were, the slowness of the other methods would be even more obvious.

First of all, I’m only testing a single method of formatting — producing a date in YYYY-MM-DD format. I also haven’t been scientific enough to really be accurate.

Beyond that, though, these various bits of code I’ve benchmarked are vastly different. The one that provides the most similar formatting functionality to mine is Svend Tofte’s (that’s probably why it’s the slowest), but even that one only does parsing, not formatting (mine does both). The others are much less fully-featured, which means they’d probably be even less performant if someone extended them to implement the same set of functionality.

As I said above, I’m not doing this to pick on anyone, but the Dojo method is probably the least efficient. It’s the simplest of all, providing only a few formatting characters, and it’s not really that much faster than Svend Tofte’s implementation. It’s probably so slow because it a) uses lots of if statements and b) uses repeated string replacements with regular expressions. This is just a hunch, but if it had the rich feature set of my implementation or Svend Tofte’s, I think it would probably be the slowest by far.

Technorati Tags:No Tags

You might also like:

  1. Javascript date parsing and formatting, Part 2
  2. JavaScript date parsing and formatting, Part 1
  3. JavaScript formatting library update
  4. How to format numbers in JavaScript flexibly and efficiently
  5. JavaScript Number Formatting Library v1.3 released

Why multi-table cross-database deletes fail in MySQL

Sometimes multi-table deletes fail in MySQL with a message about an unknown table. In this article I’ll explain the exact combination of circumstances that cause it to happen.

First, the symptoms:

delete a from db1.t1 as a
    inner join db1.t2 as b  on a.c1 = b.c1;
ERROR 1109: Unknown table 'b' in MULTI DELETE

This will happen if the following are true:

  1. No database is selected or one of the tables is not in the current database
  2. The tables are aliased

According the the MySQL documentation,

Cross-database deletes are supported for multiple-table deletes, but in this case, you must refer to the tables without using aliases.

That’s correct, but it might not seem right. If the tables are fully qualified by database and table name, but are in the current database, it works. If I switch to another database and run the same statement, it won’t work.

This bit me when I was scripting out a large procedure running from a Perl script that doesn’t specify a database. I scripted it while I was connected and had a database specified. Then I ran it, and it bombed out when it got to the delete statement.

The solution is not to alias the tables. It’s less convenient, but it’s the only thing to do sometimes.

Technorati Tags:No Tags

You might also like:

  1. How to write multi-table, cross-database deletes with aliases in MySQL
  2. How to use foreign key cascades in MySQL
  3. How to delete duplicate rows with SQL