Xaprb

Stay curious!

Two subtle bugs in OUTER JOIN queries

with 5 comments

OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.

Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.

Bug 1: a column could be NULL for two reasons, and you can’t distinguish them

If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:

select * from L left join R on l_id = r_id;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:

select * from L left join R on l_id = r_id where r_other is null;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.

Bug 2: an OUTER JOIN is converted to INNER

If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:

select * from L left join R on l_id = r_id where r_other > 1;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
+------+------+---------+

The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.

If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.

Written by Baron Schwartz

August 2nd, 2010 at 10:38 pm

Posted in Maatkit,SQL

Tagged with ,

5 Responses to 'Two subtle bugs in OUTER JOIN queries'

Subscribe to comments with RSS

  1. To overcome bug 1 you should only use non-NULL columns in the WHERE clause to be sure there is no right row. The best is always to use the same column as the joined one:

    SELECT * FROM l LEFT JOIN r ON l_id = r_id WHERE r_id IS NULL;

    Bug 2 can be made safe by taking any right-table-related filter into the JOIN clause itself rather than into a WHERE clause:

    SELECT * FROM l LEFT JOIN r ON l_id = r_id AND r_other > 1;

    Anse

    3 Aug 10 at 3:45 am

  2. I see these regularly too. The second one in particular causes the server to do a lot of extra work to do the outer join, just to discard the outer part because of the where clause.

    But just to be clear, these are bugs in the QUERIES, not in SQL itself or any particular implementation of outer joins. The behavior is correct as described, but it can trip up query writers who are not aware that they act like that.

    John

    3 Aug 10 at 11:33 am

  3. Both of these points are minor annoyances we’ve all encountered in SQL, but they really aren’t bugs — they are by design, and have several flexible workarounds.

  4. I should clarify when I say “bug”. I do not mean that SQL is buggy. I mean that people write bugs in SQL just as they write bugs in C, for example,


    if ( returnval = 1 ) {
    format_hard_drive();
    }

    That’s a bug. It’s not a bug in C, though.

    Xaprb

    3 Aug 10 at 4:39 pm

  5. The SQL does what it says. Any resulting bugs are the direct result of the dev’s misunderstanding of what the SQL says and subsequent inappropriately use.
    If you can figure out how to correct for lack of insight, please let me know! I’ll apply the patch to my system immediately.

    RJ

    27 Oct 10 at 2:26 pm

Leave a Reply