Xaprb

Stay curious!

How to find the max row per group in SQL without subqueries

A while ago I wrote about how to select the first, minimum, maximum or least row per group in SQL. This article shows how to solve this problem without subqueries.

Like many SQL problems, the key to understanding the solution is to rephrase the English question to make it easy to translate into SQL.

My first try

This is exactly the same problem as in my earlier article on how to select the first/least/max row per group in SQL. The only difference is subqueries are disallowed.

I love finding ways to do things without subqueries, even if I don’t have to. My first article on this blog was about how to write subqueries in the FROM clause without using subqueries. These tricks can occasionally be useful in very early versions of MySQL, which I still work with (I recently completed a consulting job where the only database available is MySQL 3.23).

My first thought on this problem was to use a mutex table, another blast from the past. It would work, but it’s not the best way to do it.

The solution

Let’s say you have a table of people, and you want to find the youngest of each gender. Here’s the table:

The problem is easy if I rephrase it as “find all people where there is no younger person of the same gender.” That’s easy to write as a join and translate into an exclusion join:

• Find all people — easy.
• And for each person, find all younger people of the same gender — okay, join on gender and “age less than.”
• Discard each row where there is a younger person — change the join to an exclusion join.

Here are the first two bullet points in SQL:

```select young.*, younger.age
from person as young
left outer join person as younger on younger.gender = young.gender
and younger.age < young.age

+------+-------+--------+------+
| age  | name  | gender | age  |
+------+-------+--------+------+
|    5 | david | m      | NULL |
|    8 | john  | m      |    5 |
|    9 | jane  | f      |    4 |
|    4 | kelly | f      | NULL |
|   11 | mary  | f      |    9 |
|   11 | mary  | f      |    4 |
|   13 | kay   | f      |    9 |
|   13 | kay   | f      |    4 |
|   13 | kay   | f      |   11 |
+------+-------+--------+------+```

Look at the rightmost column. There are `NULL`s only in rows where there’s no younger person of the same gender. Now it’s easy to “cross out” the other rows with the `WHERE` clause, and we’re done:

```select young.*
from person as young
left outer join person as younger on younger.gender = young.gender
and younger.age < young.age
where younger.age is null;

+------+-------+--------+------+
| age  | name  | gender | age  |
+------+-------+--------+------+
|    5 | david | m      | NULL |
|    4 | kelly | f      | NULL |
+------+-------+--------+------+```

How efficient is it?

As long as you have appropriate indexes on the table, this might not be as inefficient as you’d think. It’s theoretically a cross join, yes, but in reality if there’s a good index it’s only a repeated cross join on subsets of the data. In other words, you need a (gender, age) index on this table. Gender isn’t a very good example to use for this, because it will never be very selective, but if you only have a few rows per group and you have a leftmost index on the grouping column, it should work fine.

Conclusion

As with so many other SQL challenges, if you re-phrase the question, it’s easy to select the maximum or minimum row per group without subqueries. The key is to understand what you want, and to be able to word the problem in a way that translates from English to SQL.

Written by Xaprb

March 14th, 2007 at 8:32 pm

Posted in Uncategorized

16 Responses to 'How to find the max row per group in SQL without subqueries'

1. Some more ways to solve the same problem again and again: http://jan.kneschke.de/projects/mysql/groupwise-max :)

Jan Kneschke

15 Mar 07 at 5:44 am

2. If you only need to see min/max values and the group, then

`select gender,min(age) from person group by gender`

will work for you too.

Anonymous

15 Mar 07 at 12:47 pm

3. Can’t you do
`select *,min(age) from person group by gender`? Maybe not.

Aaron

16 Mar 07 at 2:42 pm

4. Aaron, that query is possible in some RDBMSs (notably MySQL with strict mode disabled) but it’s a many-to-one trap.

Xaprb

17 Mar 07 at 8:18 am

5. This looks good when you only have one person of minimum age for each genger. If you add another row, say

5 robert m

You will get three rows in the result, which may not be what was wanted.

RobertPhoenix

1 Aug 08 at 8:04 pm

6. [...] but effective–just follow the guidelines in this post. Performance is great, much better by far than using sub-queries. I found a marginal performance [...]

7. [...] How to find max without subqueries [...]

8. Fascinating and clearly explained.

I thought I was overlooking something – always using subqueries when I do this, but this may be very useful for me.

Vince Taylor

9 Apr 09 at 4:32 am

9. wonderful method.

when you need create a view in mysql which you want to get the latest item in every group by the some date col, this method is perfect because mysql doesn’t allow subquery in the view

Jack Su

30 Jun 09 at 6:10 am

10. Thanks!!! This is exactly what I was trying to figure out!

Joe

30 Jul 09 at 2:37 pm

11. To me works fine too, in this case show this person as well as required

>This looks good when you only have one person of >minimum age for each genger. If you add another row, >say

>5 robert m

>You will get three rows in the result, which may not >be what was wanted.

Ignacio

27 May 10 at 12:05 pm

12. the last query and its output do not match … the last age column in output should not be there

arun

2 Sep 10 at 12:51 pm

13. PERFECT!

After spending hours and hours and trying many many different queries, this solution work perfect for me.

very fast on a standart pc with 30.000 rows. Only 0.0399 secs.

Good job!

Ali desitero

7 Dec 10 at 9:31 am

14. Thank you so much for this post. The solution I was using with subqueries was horribly inefficient but this did the trick. As you said, it’s surprisingly fast too.

Anonymous

21 Dec 10 at 3:38 pm

15. As a followup, because it took me a few minutes before the facepalm. If you wanted the youngest, coolest kid, because you have multiple kids of the same age (assuming a coolness column), you’d change the join to:

left outer join person as younger on younger.gender = young.gender
and (younger.age young.coolness))

Thomas

9 Jun 11 at 5:09 pm

16. Stumbled on this after being stymied by MySQL 3.23′s lack of sub-queries. Extremely helpful, thanks so much!

David Kleszyk

14 Jan 12 at 5:03 pm