Some queries that seemingly require the use of subqueries in the FROM clause (commonly called derived tables) can be written without them. This is useful for earlier versions of MySQL, which do not support subqueries. In this article I’ll explain how to get the effect of subqueries without actually using them.
The basic principle that makes this work is this: a subquery in the FROM clause really works like an anonymous view, and can be accomplished as joins to base tables, as long as the joins don’t interact.
I’ll demonstrate how to add a sum across two grouped subqueries with a single grouped select. The three example tables are as follows:
create table category ( uid int primary key, title varchar(50) ); create table item ( uid int primary key, cat int, qty int, checkedout int ); create table bulk_checkout ( uid int primary key, cat int, qty int );
These tables represent a very simplified version of an inventory system I maintain. Items are entered individually where possible, and tagged with their ID number; tents, bikes etc are entered individually, with a quantity of 1. There are also items that we don’t count separately, such as AA batteries. These are entered together as a single item, and given a quantity of 30, for example.
When we check items out, we either check out a certain item, say item #47, or we check out X items of category Y. For example, 15 AA batteries would be checked out as qty 15 of category ‘Misc’. We check out a specific item by updating its
checkedout column, but we check out in bulk by inserting into
How I’d write the query with subqueries
The example query is a summary of item count and checked-out count, grouped by category. Here is how I would write this query with subqueries:
select category.title, it.qty, coalesce(it.qtyout, 0) + coalesce(bulk.qtyout, 0) as qtyout from category left outer join ( select cat, sum(qty) as qty, sum( case when checkedout = 1 then qty else 0 end ) as qtyout from item group by cat ) as it on it.cat = category.uid left outer join ( select cat, sum(qty) as qtyout from bulk_checkout group by cat ) as bulk on bulk.cat = category.uid
A failed attempt
Here is how I first tried to write this without subqueries. It will not work; see if you can figure out why:
select category.title, sum(it.qty) as qty, sum( case when checkedout = 1 then it.qty else 0 end + coalesce(bulk_checkout.qty, 0)) as qtyout from category left outer join item as it on it.cat = category.uid left outer join bulk_checkout on bulk_checkout.cat = category.uid
It will not work because the joins may cause rows to appear more than once in the result set, which will cause them to be counted too many times in the sums. For instance, if there are two entries in
bulk_checkout for category 1, every row in
item for category 1 will be duplicated, and the
qty will be twice too large. You may think you can divide by
count(*), or take averages, or do some other such magic, but I don’t think there is a way to do so.
Why is this? The subselects need to be independent, so rows in bulk
item must not have any effect on each other (via the join) as discussed above.
I need a way to join to both tables in one query, while having the effect of two queries that each join only to one of them. How is this possible? I can think of only one way: join on some mutually exclusive values, so rows from one table aren’t mixed with rows from the other table. A mutex table is the only answer I am aware of. Here is the query written with the mutex table:
select category.title, sum(it.qty) as qty, sum( case when checkedout = 1 then it.qty else 0 end + coalesce(bulk_checkout.qty, 0)) as qtyout from category inner join mutex on i in (0, 1) left outer join item as it on it.cat = category.uid and i = 0 left outer join bulk_checkout on bulk_checkout.cat = category.uid and i = 1 group by category.title
I’m not saying subqueries should be rewritten like this. If your RDBMS supports them, subqueries can simplify and clarify queries, and may improve query performance (if you’re using them as you should). The mutex technique actually results in 50% null values on the right-hand side, which hurts performance. There is also another table in the join, which depending on the query plan chosen will cause twice the probing, inner looping, or hashing (because it has two rows). There is also grouping, which requires sorting (bad), and coalescing, (negligible).
I tried this idea on Microsoft SQL Server with
set statistics io on, and examined the query plan and performance on a small data set. The query plan is straightforward in either case, but using mutex joins resulted in more logical reads, the expected result. I do not have statistics on the performance on MySQL.