The dangerous subtleties of LEFT JOIN and COUNT() in SQL
I want to show you two very expensive SQL idioms that almost everyone uses. These are not query optimization problems — they are financially expensive patterns of SQL.
These queries don’t do what the user thinks they do, and thus they create subtle bugs. And the bugs cause a lot of trouble, and they’re hard to track down, and much time is spent on them. I’ve been meaning to write about this for a while, but a book I read recently brought this topic up again, so I dug this draft out and finished it.
Pattern One: LEFT JOIN
There is a big problem with LEFT JOIN. Can you think what it is? If not, ask yourself what’s the most confusing thing about SQL? That’s right, it’s NULL. People don’t understand NULL, and even people who do understand it have trouble with it, because it’s ugly and nasty and illogical. And LEFT JOIN introduces NULL into your queries, even when there aren’t any NULLs in your data.
What’s wrong with this query?
select tweedle, dee, dum from table1 left join table2 on table1.foo = table2.foo where bar = 5;
If you don’t see a problem, you’ve just discovered why this is an expensive query pattern. Suppose I rewrote it like this?
select tweedle, dee, dum from table1 left join table2 on table1.foo = table2.foo where table2.bar = 5;
Consider what happens when there is no matching row in table2. SQL fills in the “missing” row with NULL. And what happens then? table2.bar = 5 is unknown (it’s neither true nor false), so that entire row is eliminated from the result.
And as a result, the query is exactly equivalent to an INNER JOIN. Subtle but true; and I see this query pattern over and over again in the real world. A significant amount of the time that I’m asked to do query optimizations (20% of the time? maybe more?), I end up explaining that a LEFT JOIN query doesn’t do what the user thinks it does.
Pattern Two: COUNT()
Ironically, COUNT() is one of the few SQL aggregate functions that actually has correct behavior according to the relational model some of the time (but not all of the time). Unfortunately, it has two problems: 1) all the other functions behave incorrectly all of the time according to the relational model, so it’s different, and 2) it has a special-case behavior that many people don’t understand.
To recap briefly, COUNT() has two behaviors. If you say COUNT(expression), then it counts the number of rows in the result for which that expression is not NULL. In other words, it counts the number of values in the result. The other behavior is COUNT(*), which counts the cardinality (number of rows) of the result.
There’s a lot to discuss here, but I want to look at just one thing. Let’s look at the following query:
select user.userid, count(email.subject) from user inner join email on user.userid = email.userid group by user.userid;
What’s happening in that query? There are a few possibilities I see. The most obvious way to translate this query into English is “get the number of conversations for each user.” Except that’s not what it does. It counts the number of emails each user has. If I wanted the number of conversations, I’d need to say “count(distinct email.subject)”. You need the DISTINCT keyword there.
I said it counts the number of emails each user has, but it really doesn’t. It actually counts the number of times there is a value in email.subject (e.g. it is NOT NULL). If the column is defined to allow NULL, there might be a difference between the number of values in email.subject and the number of emails! So, if the query’s author really wanted to know the number of emails, the query should be this:
select user.userid, count(*) from user inner join email on user.userid = email.userid group by user.userid;
But what if that’s not what the author of the query meant? There’s no way to really know. There are several possible intended meanings for the query, and there are several different ways to write the query to express those meanings more clearly. But the original query is ambiguous, for a few reasons. And everyone who reads this query afterwards will end up guessing what the original author meant. “I think I can safely change this to…”
This example is meant to be simple, so it doesn’t illustrate perfectly, but I’ve seen lots of cases where many different interpretations are equally reasonable. I often end up being asked to optimize performance for such queries, and again I can only write back to the client “someone needs to tell me what this query is supposed to do, because I can’t figure it out.”
Summary
So now you’ve seen two query idioms to avoid, and why to avoid them. And you’ve seen that the problems are really caused by NULL! If you’re interested in exploring this train of thought further, you should get a copy of SQL and Relational Theory by C. J. Date.



The first one is easily solved by moving the where clause in the left join clause, though.
Or am I missing something?
select tweedle, dee, dum
from table1
left join table2
on table1.foo = table2.foo
and table2.bar = 5
Thierry
8 Apr 09 at 5:46 am
Of course, but what I’ve observed in practice from analyzing lots of people’s queries is that they miss this. And in a complex join, it is very difficult to analyze every condition in the WHERE clause and verify that it doesn’t restrict a LEFT JOIN to an INNER JOIN.
Xaprb
8 Apr 09 at 7:06 am
The first left join example confused me a bit, until I saw the second one. I would assume just by looking at the first example that bar was a column in table1, rather than table2. I guess in real life, you’d have access to the table definitions clearing up any such ambiguity.
William
8 Apr 09 at 3:53 pm
In a more complex query (such as you see in the real world), it’s hard to do that. People don’t alias their table names, and there could be 5 tables with 20 columns each. You have to cross-reference the 100 columns to find out what is coming from where.
It’s a perfect thing for a tool to do, of course. But queries that have to have tool support to be written correctly are more costly than those a human can see are correct.
Xaprb
8 Apr 09 at 5:00 pm
When I first started writing queries to generate reports I made the left join mistake many times. I finally figured it out when the data from two queries weren’t coming out the same. I also wrote a post about it hoping to keep people from making the same mistake, but there will always be people using left join without fulling understanding it like I was.
Greg Allard
9 Apr 09 at 11:10 am
Thanks for the tips, ‘Pattern Two’ was just what I was looking for when Google found this page for me.
P.S. I almost didn’t enter this comment because I really do not love SQL and did not want to type honk.
Keith Thomas
20 Apr 09 at 7:18 pm
Keith, thanks for making my day, I got a good smile out of that one! I am always “signed in” so I forget the things I subject my guests to…
Xaprb
20 Apr 09 at 7:56 pm
I read your article and it’s probably the closest one to my current issue, but it’s not a complete answer for me.
Do you have a forum that you post to where I can get a solution to my SQL count issue? If so I would greatly appreciate your assistance.
David
2 Oct 09 at 4:29 am