How to write a SQL exclusion join
There is usually more than one way to write a given query, but not all ways are created equal. Some mathematically equivalent queries can have drastically different performance. This article examines one of the motivations for inventing LEFT OUTER join and including it in the SQL standard: improved performance through exclusion joins.
LEFT OUTER join syntax was added to the SQL-92 standard specifically to address certain queries that had only been possible with NOT IN subqueries. The disadvantage of using subqueries in these situations is that they may require creating many anonymous tables and probing into them. A clever optimizer could generate the same plan as a LEFT OUTER join, but since there was no such thing at the time and query optimizers were much less capable, query performance could take quite a hit. I should pause here and say that I wasn’t programming in 1992, so I’m only speaking from the history I’ve read and heard, not from personal experience. However, I definitely have personal experience with the performance hits of NOT IN queries!
Setup
I’ll use two tables of data, apples and oranges.
| Variety | Price |
|---|---|
| Fuji | 5.00 |
| Gala | 6.00 |
| Variety | Price |
|---|---|
| Valencia | 4.00 |
| Navel | 5.00 |
The old-style way
In old-style SQL, one joined data sets by simply specifying the sets, and then specifying the match criteria in the WHERE clause, like so:
select *
from apples, oranges
where apples.Price = oranges.Price
and apples.Price = 5
Placing the join conditions in the WHERE clause is confusing when queries get more complex. It becomes hard to tell which conditions are used to join the tables (apples.Price = oranges.Price), and which are used to exclude results (apples.Price = 5). The two are equivalent in old-style joins, but as mentioned, some joins cannot be written in this style (more on this later).
The new way
The updated SQL standard addressed these issues by separating the join conditions from the WHERE clause. Join conditions now go in the FROM clause, greatly clarifying the syntax. Here is the simple join written in the newer style:
select *
from apples
inner join oranges
on apples.Price = oranges.Price
where apples.Price = 5
Outer joins
Separating the join conditions from the WHERE clause allows OUTER joins. There are three kinds of OUTER joins: LEFT, RIGHT and FULL. The most common is a LEFT OUTER join, but all three types have the characteristic of not eliminating rows entirely from the result set when they fail the condition. Instead, when data does not match, the row is included from one table as usual, and the other table’s columns are filled with NULLs (since there is no matching data to insert).
In a LEFT OUTER join, every row from the left-hand table is included, whether there is a matching row in the right-hand table or not. When there is a matching row in the right-hand table, it is included; otherwise the right-hand table’s columns are filled with NULLs. A demonstration may clarify:
select *
from apples
left outer join oranges
on apples.Price = oranges.Price
| Variety | Price | Variety | Price |
|---|---|---|---|
| Fuji | 5.00 | Navel | 5.00 |
| Gala | 6.00 | NULL | NULL |
INNER joins select matching rows in the result set. It is possible to use an INNER join to select apples and oranges with matching prices, as above. With LEFT OUTER joins it is possible to answer the reverse query, “show me apples for which there are no oranges with a matching price.” Simply eliminate matching rows in the WHERE clause:
select apples.Variety
from apples
left outer join oranges
on apples.Price = oranges.Price
where oranges.Price is null
Outer joins are not possible with inner join
The above query is not possible with INNER JOIN. The following query does not accomplish the same thing:
select apples.Variety
from apples
inner join oranges
on apples.Price = oranges.Price
where apples.Price <> oranges.Price
In fact, this query will return nothing, because the join condition contradicts the WHERE clause. This query is not the same thing either:
select apples.Variety
from apples
inner join oranges on
apples.Price <> oranges.Price
Why? Because if there are no rows in oranges, nothing will get returned. It is simply not possible to write this query with an INNER join or an old-style join, no matter what technique is used. Don’t be fooled by analyzing the two data sets presented in this article; for some cases you may be able to get the same behavior, but not for all possible data sets. There is a way to write this query using subqueries, though:
select apples.Variety
from apples
where apples.Price not in (
select Price from oranges)
Outer joins and subqueries
Why use a LEFT OUTER join instead of using a subquery? Depending on the query, this technique may force the subquery to be evaluated for every row in the left-hand table (especially for correlated subqueries, where the subquery refers to values from the left-hand table). A LEFT OUTER join, by contrast, can often use a much more efficient query plan. Again, they may be mathematically equivalent — and a good query optimizer may generate the same query plan, but this is not always the case. It depends heavily on the query, the optimizer, and how the tables are indexed. I have seen queries perform orders of magnitude better when rewritten with an exclusion join.



What a helpful piece! I’m relatively new to SQL so it was great to see someone go in depth and explain what I was trying to do. Keep it up!
John
17 Feb 06 at 12:23 pm
Agreed. Nice article.
Steve
22 Feb 06 at 8:40 pm
Very well written. Thank you for sharing your insight!
Kyle W. Cartmell
13 Apr 06 at 4:07 pm
Extremely helpful! Thanks for posting this article!
Anna
11 May 06 at 10:46 am
Wonderful pointer. Saved my day.
Terence Koh
29 Jun 06 at 10:33 pm
I was looking for a way to exclude rows without using the NOT IN. The LEFT OUTER join is yielding the correct results so far. Thanks for the article.
Andre
22 Dec 06 at 6:41 pm
Nice article. i was lookin for the same. keep it up.
Anoop Nair
6 Feb 07 at 3:20 am
Very helpful to exclude rows when combining two tables.
chris
6 Mar 07 at 4:19 pm
Very useful! I was looking for a way to remove rows that had no association anymore, and the LEFT OUTER JOIN you posted is exactly what I needed. thanks!
LKRaider
3 Apr 07 at 9:51 am
About my previous comment, just to note that it is not a simple case of replacing SELECT for DELETE, but you have to tell DELETE which table to actually remove rows from. As in:
DELETE table1 FROM table1 LEFT OUTER JOIN table2 …. etc
LKRaider
3 Apr 07 at 10:05 am
Great article.
Please add more advanced tricks.
Can’t get enough of them. :)
Pino
17 May 07 at 10:09 am
Great article, really helped me shift from using multiple inner join queries for exclusion joins to single left outer joins to accomplish the same thing.
James
9 Dec 07 at 8:00 am
Actually, OUTER JOINS were available in many RDBMS packages before SQL92, but the SQL syntax for those joins was non-standard. Oracle used the weird ( ) operator. MySQL supported this syntax until 4.0, but it was removed in 4.1 in favor of support only for the SQL standard.
The Oracle 7.3 syntax was:
SELECT *
FROM t1,
t2
WHERE t1.some_id = t2.some_id ( )
Justin Swanhart
31 Jan 08 at 11:27 pm
Well that is odd. There are supposed to be plus signs between those empty parenthesis.
Justin Swanhart
31 Jan 08 at 11:28 pm
Thanks for the article, it helped me a lot because I am MYSTIFIED by joins.
My question is sort of an aside:
The only reason I use joins is for these specific cases, where they offer me the ability to exclude or miss fields. Conceptually, I think the JOIN syntax makes no sense at all, and is kind of a lame convenience kludge. I have always naturally used the ‘old school’ method of comparing tables because that one has a simple standard logic of joining, grouped in the where clause.
My question is, can you explain to me why you (or lots of people) think the JOIN syntax itself is somehow more sensical than the ‘old-style’ way, because I don’t see it.
Steve Bussetti
15 Feb 08 at 3:12 pm
Steve, the JOIN clauses can be written as WHERE clauses for inner joins, but I prefer to separate them anyway. It helps me see more clearly which code is there to match rows between tables, and which code is for filtering out rows from the logical result. (Even though I know that the query optimizer really applies the filters as it performs the join, not after, it makes complicated joins easier for me to write if I pretend it doesn’t).
Xaprb
16 Feb 08 at 12:48 pm
Very well written and useful!! Thank you.
Travis McPeak
13 Mar 08 at 9:56 pm
I have a query which works fine in oracle, but it is hogging memmory
in mysql. How to write this query in MYSQL in a efficent way?
delete from table1
where not exists(select * from table2 where table1.id = table2.id)
Thanks in advance
Praveen
Praveen
28 Aug 08 at 11:10 am
I have some porblem in writting query
i have to find out the letest status of perticuler item when there is multiple similer entries.
but I want latest value for the perticuler entry
plz help me
suhas
10 Sep 08 at 5:44 am
This was very helpful, I had an exclusion based on a subquery that was so slow it crashed the system multiple times. Thanks for the help!
Jordan
26 Nov 08 at 12:29 pm
Quite helpful- Thank You for sharing.
Hussein
13 Feb 09 at 8:04 am
Wow. Thanks man, you are a stress saver
Sola Olu
5 Jun 09 at 6:38 am
Great! Help me a lot!!
Lucian
3 Sep 09 at 2:49 am
Thanks!
Liv Detrick
17 Sep 09 at 1:29 pm
May i know how to write the outer-join between table R (A,B) and S (B,C) in SQL without using the outerjoin syntax?
Shino
6 Apr 10 at 2:26 am
Thanks a lot! New to SQL and DB, and this is a very useful and well-written article.
abc
28 May 11 at 5:56 am
I had exactly the same problem excluding results from a JOIN, and this comprehensive explanation perfectly fixed the issue. The subquery was the solution – and the whole query isn’t so slow after all. Thank you for sharing this!
E. Serrano
23 Jun 12 at 5:09 pm
Been battling with inserting data across many tables in one query in which different conditions should be checked before inserts are considered with a complete collection of datasets for example, CHECK TABLE ONE IF DATA ONE EXISTS IF YES IGNORE THE DATA AND CHECK IF A COMBINATION OF DATA ONE AND DATA TWO EXITS IN TABLE TWO IF YES IGNORE AND CHECK IF A COMBINATION OF DATA ONE, TWO AND THREE EXIST IN TABLE THREE…etc
Timi
9 Sep 12 at 2:36 am