How to write SQL JOIN clauses more compactly
Posted in Databases on Nov 28, 2006
Here’s a way to write join clauses so they are more compact, more readable, and less confusing.
Standard SQL:2003 defines a
USING clause that can serve the same function as the
ON clause in the familiar
JOIN syntax. For example, the following join
select a.col1, b.col2 from a inner join b on a.col3 = b.col3
May be written as follows:
select a.col1, b.col2 from a inner join b using(col3)
That may not look like much of an improvement, but it is a big help in larger joins where many tables have columns with the same names. In these cases, not only is it tedious to write out every pair of columns that must match in the join, you often have to refer to the tables with aliases too. And it’s tough to read such a join and understand it, or debug it. For example, what’s wrong with this join?
select tbl1.col1, tbl2.col2, tbl3.col2, tbl4.col1 from apples as tbl1 inner join oranges as tbl2 on tbl1.col3 = tbl2.col3 inner join grapes as tbl3 on tbl3.col3 = tbl2.col3 inner join peaches as tbl4 on tbl3.col3 = tbl2.col3
The statement is valid and will execute, but it won’t give the results you probably wanted (tbl4′s join clause doesn’t refer to any columns from tbl4). The bug is even harder to find if the statement isn’t neatly indented and consistently organized. That statement is better written with
select tbl1.col1, tbl2.col2, tbl3.col2, tbl4.col1 from apples as tbl1 inner join oranges as tbl2 using(col3) inner join grapes as tbl3 using(col3) inner join peaches as tbl4 using(col3)
USING matches the specified columns from each table, eliminating the need to write them out twice explicitly with aliases. In MySQL 5, you can see how the statement gets rewritten by the optimizer with
EXPLAIN EXTENDED followed by
SHOW WARNINGS. The result shows that it gets rewritten as an old-style join with the column-matching done in the
USING isn’t a drop-in replacement for the
ON clause in normal join syntax. There are some differences, especially on different platforms (and in MySQL’s case, even differences between versions of the product – see MySQL JOIN Syntax). This probably makes relational purists hot under the collar. Of course, everything about SQL makes relational purists mad, because SQL isn’t relational, and database products are even less relational than the SQL standard… oh well.
In my opinion, use it if it makes your life easier. Programming is hard – use whatever your tools give you to ease the burden.
One more note: there’s also a
NATURAL JOIN syntax that automatically discovers columns with the same names in both tables and matches them in the join. I don’t use this in programs, because if someone adds more columns to one of the tables involved in such a join, the join criteria will silently change. I think join criteria should always be explicit, for the same reason I avoid blind inserts. However, this syntax can be convenient for writing one-off queries at the command line.