Xaprb

Stay curious!

Speaking at NovaRUG on Thursday

with 2 comments

I’ll be joining the NovaRUG (Northern Virginia Ruby Users’ Group) on Thursday to talk about MySQL performance. See their blog for the details and how to RSVP.

Written by Xaprb

August 16th, 2010 at 9:38 am

Posted in Conferences, SQL

Tagged with , ,

Two subtle bugs in OUTER JOIN queries

with 4 comments

OUTER JOIN queries in SQL are susceptible to two very subtle bugs that I’ve observed a number of times in the real world. Daniel and I have been hammering out ways to automatically detect queries that suffer from these bugs, in a relatively new Maatkit tool called mk-query-advisor. It’s part of our series of advisor tools for MySQL. I wrote a blog post about it a while ago. Automated analysis of bad query patterns is a good thing to write tools to do, because catching buggy queries is hard work if you do it manually.

Let’s dive right in and analyze these subtle bugs. Warning: if you don’t understand how SQL handles NULL, you’re not going to understand the following. Many people have a hard time with NULL, which is why these bugs are so hard to understand and avoid. This is one reason why SQL is a hard language to use properly.

Bug 1: a column could be NULL for two reasons, and you can’t distinguish them

If the outer table in your query contains NULL-able columns, and you place a WHERE clause to filter out all but those rows, you’re going to get bugs because a non-matching row in the outer table will be all-NULL. Here’s an example. Let’s start with a plain outer join query:

select * from L left join R on l_id = r_id;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

Here we see that one row in the outer table is missing, and one row (the middle row) has a NULL r_other column. Now, let’s add a WHERE clause:

select * from L left join R on l_id = r_id where r_other is null;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    2 |    2 |    NULL | 
|    3 | NULL |    NULL | 
+------+------+---------+

This query is buggy, because the two rows are returned for completely different reasons, and you can’t be sure which is which. IS NULL clauses can safely be placed on the columns used in the JOIN clause, but not on other columns in the outer table that might be NULL.

Bug 2: an OUTER JOIN is converted to INNER

If you place a non-null-safe comparison operator on any column in the outer table that isn’t part of the JOIN clause, you implicitly disable the outer-ness of the query and convert it to an INNER JOIN. Here’s an example:

select * from L left join R on l_id = r_id where r_other > 1;
+------+------+---------+
| l_id | r_id | r_other |
+------+------+---------+
|    1 |    1 |       5 | 
+------+------+---------+

The left-outer-ness of the above query is what causes the third row to be output in the first query I showed you above. The greater-than operator in this example automatically makes the left-ness impossible, because anytime there’s a row in the inner table that has no match in the outer table, it’ll be filled in with NULLs, and those NULLs will be eliminated by the operator. So the effect is that only matching rows will ever be output.

If you want to ponder variations and subtleties of the above, you can read more discussion on the issue report where we’re hammering out the details of automatically detecting and warning about these sneaky errors.

Written by Xaprb

August 2nd, 2010 at 10:38 pm

Posted in Maatkit, SQL

Tagged with ,

Speaking at MySQL Meetup in Northern Virginia

with 3 comments

The closest thing I know of to a “Northern Virginia MySQL Meetup” is the Sterling Database Data Solutions Group. I got in touch with the organizer and we scheduled a meeting next Wednesday July 28th. I’ll be presenting, and so will someone from Fusion-IO, a solid-state storage vendor. This is on short notice, so tell your friends about it! It would be great to grow a strong monthly meetup presence in this area.

Here’s the abstract I sent: “This talk covers best practices to help you get the most out of MySQL performance. It assumes you know a database well, though it need not be MySQL. We’ll cover several angles of the topic. Configuration is usually the first thing people ask about. Although it’s possible to misconfigure MySQL and get bad performance, the configuration options you need for good performance are few and rather simple. We’ll see how to inspect MySQL’s performance and status, also a fairly simple subject. Next is query tuning. There are a few surprises in MySQL due to its simpler query execution engine than Oracle or SQL Server. We’ll see how to avoid those surprises and work with the query optimizer. Finally, we’ll focus on what you should know if you are considering migrating part or all of your application from Oracle. There will be plenty of time for questions, so bring yours!”

Written by Xaprb

July 21st, 2010 at 9:34 am

Posted in Conferences, SQL

Tagged with