Xaprb

Stay curious!

How to find the max row per group in SQL without subqueries

with 11 comments

A while ago I wrote about how to select the first, minimum, maximum or least row per group in SQL. This article shows how to solve this problem without subqueries.

Like many SQL problems, the key to understanding the solution is to rephrase the English question to make it easy to translate into SQL.

My first try

This is exactly the same problem as in my earlier article on how to select the first/least/max row per group in SQL. The only difference is subqueries are disallowed.

I love finding ways to do things without subqueries, even if I don’t have to. My first article on this blog was about how to write subqueries in the FROM clause without using subqueries. These tricks can occasionally be useful in very early versions of MySQL, which I still work with (I recently completed a consulting job where the only database available is MySQL 3.23).

My first thought on this problem was to use a mutex table, another blast from the past. It would work, but it’s not the best way to do it.

The solution

Let’s say you have a table of people, and you want to find the youngest of each gender. Here’s the table:

The problem is easy if I rephrase it as “find all people where there is no younger person of the same gender.” That’s easy to write as a join and translate into an exclusion join:

  • Find all people — easy.
  • And for each person, find all younger people of the same gender — okay, join on gender and “age less than.”
  • Discard each row where there is a younger person — change the join to an exclusion join.

Here are the first two bullet points in SQL:

select young.*, younger.age
from person as young
   left outer join person as younger on younger.gender = young.gender
      and younger.age < young.age

+------+-------+--------+------+
| age  | name  | gender | age  |
+------+-------+--------+------+
|    5 | david | m      | NULL | 
|    8 | john  | m      |    5 | 
|    9 | jane  | f      |    4 | 
|    4 | kelly | f      | NULL | 
|   11 | mary  | f      |    9 | 
|   11 | mary  | f      |    4 | 
|   13 | kay   | f      |    9 | 
|   13 | kay   | f      |    4 | 
|   13 | kay   | f      |   11 | 
+------+-------+--------+------+

Look at the rightmost column. There are NULLs only in rows where there’s no younger person of the same gender. Now it’s easy to “cross out” the other rows with the WHERE clause, and we’re done:

select young.*
from person as young
   left outer join person as younger on younger.gender = young.gender
      and younger.age < young.age
where younger.age is null;

+------+-------+--------+------+
| age  | name  | gender | age  |
+------+-------+--------+------+
|    5 | david | m      | NULL | 
|    4 | kelly | f      | NULL | 
+------+-------+--------+------+

How efficient is it?

As long as you have appropriate indexes on the table, this might not be as inefficient as you’d think. It’s theoretically a cross join, yes, but in reality if there’s a good index it’s only a repeated cross join on subsets of the data. In other words, you need a (gender, age) index on this table. Gender isn’t a very good example to use for this, because it will never be very selective, but if you only have a few rows per group and you have a leftmost index on the grouping column, it should work fine.

Conclusion

As with so many other SQL challenges, if you re-phrase the question, it’s easy to select the maximum or minimum row per group without subqueries. The key is to understand what you want, and to be able to word the problem in a way that translates from English to SQL.

Written by Xaprb

March 14th, 2007 at 8:32 pm

Posted in SQL

11 Responses to 'How to find the max row per group in SQL without subqueries'

Subscribe to comments with RSS

  1. Some more ways to solve the same problem again and again: http://jan.kneschke.de/projects/mysql/groupwise-max :)

    Jan Kneschke

    15 Mar 07 at 5:44 am

  2. If you only need to see min/max values and the group, then

    select gender,min(age) from person group by gender

    will work for you too.

    Anonymous

    15 Mar 07 at 12:47 pm

  3. Can’t you do
    select *,min(age) from person group by gender? Maybe not.

    Aaron

    16 Mar 07 at 2:42 pm

  4. Aaron, that query is possible in some RDBMSs (notably MySQL with strict mode disabled) but it’s a many-to-one trap.

    Xaprb

    17 Mar 07 at 8:18 am

  5. This looks good when you only have one person of minimum age for each genger. If you add another row, say

    5 robert m

    You will get three rows in the result, which may not be what was wanted.

    RobertPhoenix

    1 Aug 08 at 8:04 pm

  6. [...] but effective–just follow the guidelines in this post. Performance is great, much better by far than using sub-queries. I found a marginal performance [...]

  7. [...] How to find max without subqueries [...]

  8. Fascinating and clearly explained.

    I thought I was overlooking something – always using subqueries when I do this, but this may be very useful for me.

    Vince Taylor

    9 Apr 09 at 4:32 am

  9. wonderful method.

    when you need create a view in mysql which you want to get the latest item in every group by the some date col, this method is perfect because mysql doesn’t allow subquery in the view

    Jack Su

    30 Jun 09 at 6:10 am

  10. Thanks!!! This is exactly what I was trying to figure out!

    Joe

    30 Jul 09 at 2:37 pm

  11. To me works fine too, in this case show this person as well as required

    >This looks good when you only have one person of >minimum age for each genger. If you add another row, >say

    >5 robert m

    >You will get three rows in the result, which may not >be what was wanted.

    Ignacio

    27 May 10 at 12:05 pm

Leave a Reply