Xaprb

Stay curious!

Archive for the ‘null’ tag

Two subtle bugs in OUTER JOIN queries

with 5 comments

OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.

Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.

Bug 1: a column could be NULL for two reasons, and you can’t distinguish them

If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:

select * from L left join R on l_id = r_id;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:

select * from L left join R on l_id = r_id where r_other is null;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.

Bug 2: an OUTER JOIN is converted to INNER

If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:

select * from L left join R on l_id = r_id where r_other > 1;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
+------+------+---------+

The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.

If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.

Written by Xaprb

August 2nd, 2010 at 10:38 pm

Posted in Maatkit,SQL

Tagged with ,

The difference between a unique index and primary key in MySQL

with 3 comments

There’s a really important difference between a unique index (MySQL’s answer to a “unique constraint”) and a primary key in MySQL. Please take a look at this:

CREATE TABLE `t` (
  `a` int,
  `b` int,
  `c` int,
  UNIQUE KEY `a` (`a`,`b`)
)

The combination of columns a, b should uniquely identify any tuple in the table, right?

select * from t;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    2 |    3 | 
| NULL | NULL |    1 | 
| NULL | NULL |    1 | 
| NULL | NULL |    1 | 
+------+------+------+

Wrong. Our arch-enemy NULL messes things up again:

A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index allows multiple NULL values for columns that can contain NULL

MySQL doesn’t let you define a primary key over nullable columns, for this reason. This is as of version 4.0, I believe — I recall that in version 3.23 there was nothing special about a primary key; it was just a unique non-nullable index named PRIMARY.

Written by Xaprb

September 12th, 2009 at 5:07 pm

Posted in SQL

Tagged with , , ,

How to avoid an extra index scan in MySQL

without comments

Is your MySQL server doing an extra index scan on queries that need to check a key for matches or NULL? It’s easy for this to happen accidentally, but it’s also easy to fix, especially in MySQL 5.0 and up. Here’s how.

If you read the manual page for EXPLAIN, you’ll see the ref_or_null “join type” (I think “access type” is really a better term) mentioned. If you see this in EXPLAIN, it means MySQL has to search the index for matches, then search again for NULLs. If there are no NULLs in that column, and MySQL knew that, it could avoid the extra search.

You might see this in subqueries or when you use elaborate JOIN clauses, or even when you use a simple WHERE clause. For example, here’s a query that will do a ref_or_null access plan on the Sakila sample database:

explain select * from sakila.film
where original_language_id = 3 or original_language_id is null\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ref_or_null
possible_keys: idx_fk_original_language_id
          key: idx_fk_original_language_id
      key_len: 2
          ref: const
         rows: 476
        Extra: Using where

Why does this happen? Simple: the indexed column is defined as NULLable. Here’s a query you can use to find all such columns:

select c.table_schema, c.table_name, c.column_name
from information_schema.columns as c
   inner join information_schema.key_column_usage using(table_schema, table_name, column_name)
where c.table_schema = 'sakila' and c.is_nullable = 'yes';
+--------------+------------+----------------------+
| table_schema | table_name | column_name          |
+--------------+------------+----------------------+
| sakila       | film       | original_language_id | 
| sakila       | payment    | rental_id            | 
+--------------+------------+----------------------+

If the column shouldn’t be allowed to be NULL, make sure you specify that in the column’s options! In fact, the original_language_id column probably should be defined as NULLable, but I commonly see columns defined as NULLable when they shouldn’t be. The performance penalty isn’t the end of the world, but it’s still worth fixing.

Written by Xaprb

July 27th, 2007 at 8:07 am

Posted in Uncategorized

Tagged with , , , , , ,