Archive for the ‘bugs’ tag
OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.
Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.
Bug 1: a column could be NULL for two reasons, and you can’t distinguish them
If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:
select * from L left join R on l_id = r_id; +------+------+---------+ | l_id | r_id | r_other | +------+------+---------+ | 1 | 1 | 5 | | 2 | 2 | NULL | | 3 | NULL | NULL | +------+------+---------+
Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:
select * from L left join R on l_id = r_id where r_other is null; +------+------+---------+ | l_id | r_id | r_other | +------+------+---------+ | 2 | 2 | NULL | | 3 | NULL | NULL | +------+------+---------+
This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.
Bug 2: an OUTER JOIN is converted to INNER
If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:
select * from L left join R on l_id = r_id where r_other > 1; +------+------+---------+ | l_id | r_id | r_other | +------+------+---------+ | 1 | 1 | 5 | +------+------+---------+
The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.
If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.
Zack Urlocker says MySQL 5.1 has zero bugs. He may have been misquoted, or quoted out of context, but there it is. I’ll quote enough of it that you can’t take it out of context twice:
Mickos also said MySQL 5.1 has upgraded its reliability and ease of use over 2005′s v5.0.
“Now we can admit it, but this version is much improved over 5.0, which we weren’t totally happy with,” Mickos confided.
He reported that more than 1,300 bugs (997 in 2007, 386 so far in 2008) have been fixed in v5.1, and that, according to standard DBT2 benchmarks, the performance of v5.1 is 10 to 15 percent better than the previous version.
“This version now has zero bugs,” Urlocker told eWEEK.
You can check for yourself at the MySQL bug statistics page.
Of course it’s not true. But what did Zack really say, I wonder?