Xaprb

Stay curious!

How to select the first/least/max row per group in SQL

with 154 comments

Here are some common SQL problems, all of which have related solutions: how do I find the most recent log entry for each program? How do I find the most popular item from each category? How do I find the top score for each player? In general, these types of “select the extreme from each group” queries can be solved with the same techniques. I’ll explain how to do that in this article, including the harder problem of selecting the top N entries, not just the top 1.

This topic is related to numbering rows, which I just wrote about (see my articles about MySQL-specific and generic techniques to assign a number to each row in a group). Therefore I’ll use nearly the same table and data as I used in those articles, with the addition of a price column:

+--------+------------+-------+
| type   | variety    | price |
+--------+------------+-------+
| apple  | gala       |  2.79 | 
| apple  | fuji       |  0.24 | 
| apple  | limbertwig |  2.87 | 
| orange | valencia   |  3.59 | 
| orange | navel      |  9.36 | 
| pear   | bradford   |  6.05 | 
| pear   | bartlett   |  2.14 | 
| cherry | bing       |  2.55 | 
| cherry | chelan     |  6.33 | 
+--------+------------+-------+

Selecting the one maximum row from each group

Let’s say I want to select the most recent log entry for each program, or the most recent changes in an audit table, or something of the sort. This question comes up over and over on IRC channels and mailing lists. I’ll re-phrase the question in terms of fruits. I want to select the cheapest fruit from each type. Here’s the desired result:

+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
+--------+----------+-------+

There are a few common solutions to this problem. All involve two steps: finding the desired value of price, and then selecting the rest of the row based on that.

One common solution is a so-called self-join. Step one is to group the fruits by type (apple, cherry etc) and choose the minimum price:

select type, min(price) as minprice
from fruits
group by type;
+--------+----------+
| type   | minprice |
+--------+----------+
| apple  |     0.24 | 
| cherry |     2.55 | 
| orange |     3.59 | 
| pear   |     2.14 | 
+--------+----------+

Step two is to select the rest of the row by joining these results back to the same table. Since the first query is grouped, it needs to be put into a subquery so it can be joined against the non-grouped table:

select f.type, f.variety, f.price
from (
   select type, min(price) as minprice
   from fruits group by type
) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| cherry | bing     |  2.55 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
+--------+----------+-------+

Another common way to do this is with a correlated subquery. This can be much less efficient, depending on how good your system’s query optimizer is. You might find it clearer, though.

select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type);
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
+--------+----------+-------+

Both queries are logically equivalent, though they may not perform the same.

Select the top N rows from each group

This is a slightly harder problem to solve. Finding a single row from each group is easy with SQL’s aggregate functions (MIN(), MAX(), and so on). Finding the first several from each group is not possible with that method because aggregate functions only return a single value. Still, it’s possible to do.

Let’s say I want to select the two cheapest fruits from each type. Here’s a first try:

select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
   or price = (select min(price) from fruits as f where f.type = fruits.type
      and price > (select min(price) from fruits as f2 where f2.type = fruits.type));
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | gala     |  2.79 | 
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| orange | navel    |  9.36 | 
| pear   | bradford |  6.05 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
| cherry | chelan   |  6.33 | 
+--------+----------+-------+

Yuck! That can be written as a self-join, but it’s just as bad (I leave it as an exercise for the reader). This gets worse as you go to higher numbers (top 3, top 4…). There are other ways to phrase the statement, but they all boil down to the same thing, and they’re all pretty unwieldy and inefficient.

There’s a better way: select the variety from each type where the variety is no more than the second-cheapest of that type.

select type, variety, price
from fruits
where (
   select count(*) from fruits as f
   where f.type = fruits.type and f.price < fruits.price
) <= 2;

This is elegant, and lets you vary N without rewriting your query (a very good thing!), but it’s functionally the same as the previous query. Both are essentially a quadratic algorithm relative to the number of varieties in each type. And again, some query optimizers may not do well with this and make it quadratic with respect to the number of rows in the table overall (especially if no useful index is defined), and the server might get clobbered. Are there better ways? Can it be done with one pass through the data, instead of the many passes required by a correlated subquery? You know it can, or I wouldn’t be writing this, now would I?

Use UNION

If there’s an index on (type, price), and there are many more records to eliminate than to include in each group, a more efficient single-pass method (especially for MySQL, but also for some other RDBMSs) is to break the queries out separately and put a limit on each, then UNION them all back together. Here’s the syntax you need for MySQL:

(select * from fruits where type = 'apple' order by price limit 2)
union all
(select * from fruits where type = 'orange' order by price limit 2)
union all
(select * from fruits where type = 'pear' order by price limit 2)
union all
(select * from fruits where type = 'cherry' order by price limit 2)

Peter Zaitsev has written in detail about this technique, so I won’t go into it too much more here. If it suits your purposes, it can be a very good solution.

One note: use UNION ALL, not just UNION. It prevents the server sorting the results to eliminate duplicates before returning them. In this case there will be no duplicates, so I’m telling the server to skip that (useless, expensive) step.

Do it with user variables on MySQL

The UNION trick is an especially good idea when the results are a small fraction of the rows in the table and there is an index that can be used for sorting the rows. Another linear-time technique, which might be a good option in cases where you are selecting most of the rows from the table anyway, is user variables. This is MySQL-specific. Please refer to my previous post on how to number rows in MySQL for the gory details of why this works:

set @num := 0, @type := '';

select type, variety, price
from (
   select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
  from fruits
  order by type, price
) as x where x.row_number <= 2;

This isn’t one pass through the table, by the way. The subquery is implemented as a temporary table behind the scenes, so filling it with data is one pass; then selecting every row from it and applying the WHERE clause is another. However, twice through is still O(n) with respect to the table size. That’s a lot better than correlated subqueries, which are O(n2) with respect to the group size — even moderate group sizes cause bad performance (say there are five varieties of each fruit. That’s on the order of 25 passes through the table, all told).

One-pass technique on MySQL… maybe?

If you want to leave your queries up the the query optimizer’s whims, you can try this technique, which builds no temporary tables and makes just one pass through:

set @num := 0, @type := '';

select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
from fruits
group by type, price, variety
having row_number <= 2;

This theoretically ought to work if MySQL orders by the GROUP BY criteria, which it sometimes does for efficiency and to produce the expected results. Does it work? Here’s what it returns on MySQL 5.0.27 on Windows:

+--------+----------+-------+------------+--------+
| type   | variety  | price | row_number | dummy  |
+--------+----------+-------+------------+--------+
| apple  | gala     |  2.79 |          1 | apple  |
| apple  | fuji     |  0.24 |          3 | apple  |
| orange | valencia |  3.59 |          1 | orange |
| orange | navel    |  9.36 |          3 | orange |
| pear   | bradford |  6.05 |          1 | pear   |
| pear   | bartlett |  2.14 |          3 | pear   |
| cherry | bing     |  2.55 |          1 | cherry |
| cherry | chelan   |  6.33 |          3 | cherry |
+--------+----------+-------+------------+--------+

Look closely… it’s returning rows one and three from each group, and they’re not numbered in order of increasing price? Huh? But the HAVING clause says the row_number should be no greater than 2! Here’s what it returns on version 5.0.24a on Ubuntu:

+--------+------------+-------+------------+--------+
| type   | variety    | price | row_number | dummy  |
+--------+------------+-------+------------+--------+
| apple  | fuji       |  0.24 |          1 | apple  |
| apple  | gala       |  2.79 |          1 | apple  |
| apple  | limbertwig |  2.87 |          1 | apple  |
| cherry | bing       |  2.55 |          1 | cherry |
| cherry | chelan     |  6.33 |          1 | cherry |
| orange | valencia   |  3.59 |          1 | orange |
| orange | navel      |  9.36 |          1 | orange |
| pear   | bartlett   |  2.14 |          1 | pear   |
| pear   | bradford   |  6.05 |          1 | pear   |
+--------+------------+-------+------------+--------+

Look, this time everything is numbered 1 and every row is returned. Wonky. This is exactly what the MySQL manual page on user variables warns about.

This technique is pretty much non-deterministic, because it relies on things that you and I don’t get to control directly, such as which indexes MySQL decides to use for grouping. However, if you need to use it — and I know there are some folks out there who do, because I’ve consulted for them — you can still tweak it. We’re getting into the realm of really bastardizing SQL, but the results above came from a table without indexes other than the primary key on (type, variety). What happens if I add an index MySQL can use for grouping?

alter table fruits add key(type, price);

Nothing changes, and EXPLAIN shows why: the query doesn’t use the index I just added. Why? Because the grouping is on three columns, and the index is only on two. In fact, the query is using a temp table and filesort anyway, so this is still not achieving the once-through goal. I can force it to use the index:

set @num := 0, @type := '';

select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
from fruits force index(type)
group by type, price, variety
having row_number <= 2;

Let’s see if that works:

+--------+----------+-------+------------+--------+
| type   | variety  | price | row_number | dummy  |
+--------+----------+-------+------------+--------+
| apple  | fuji     |  0.24 |          1 | apple  | 
| apple  | gala     |  2.79 |          2 | apple  | 
| cherry | bing     |  2.55 |          1 | cherry | 
| cherry | chelan   |  6.33 |          2 | cherry | 
| orange | valencia |  3.59 |          1 | orange | 
| orange | navel    |  9.36 |          2 | orange | 
| pear   | bartlett |  2.14 |          1 | pear   | 
| pear   | bradford |  6.05 |          2 | pear   | 
+--------+----------+-------+------------+--------+

Ah, now we’re cooking! It did what I wanted, without a filesort or temporary table. Another way to do this, by the way, is to take variety out of the GROUP BY so it uses the index on its own. Because this selects a non-grouped column from a grouped query, this only works if you are running with ONLY_FULL_GROUP_BY mode turned off, which I hope you are not doing without good reason.

Other methods

Be sure to check the comments for user-contributed methods. There are some really novel approaches. I always learn so much from your comments… thank you!

Conclusion

Well, that’s it. I’ve shown you several ways of solving the common “get the extreme row from each group” query, and then moved on to how you can get the top N rows from each group in various ways. Then I dove into MySQL-specific techniques which some (including myself, depending on my mood) would regard as mildly foolish to utterly stupid. But if you need the last bit of speed out of your server, you sometimes have to know when to break the rules. And for those who think this is just MySQL foolishness, it’s not; I’ve seen people desperately do these types of things on other platforms too, such as SQL Server. There are hacks and tweaks on every platform, and people who need to use them.

I hope you’ve enjoyed and profited from this discussion. If you did, maybe you’d like to subscribe for convenient notification of future articles. Happy coding!

Further Reading:

Written by Xaprb

December 7th, 2006 at 8:59 am

Posted in Uncategorized

Tagged with

154 Responses to 'How to select the first/least/max row per group in SQL'

Subscribe to comments with RSS

  1. [...] How to select the first/least/max row per group in SQL [...]

  2. I believe you can also solve the “cheapest 2 from each group” problem with a LEFT JOIN:

    SELECT f.type, f.variety, f.price
    FROM fruits f
    LEFT JOIN fruits f2 ON (f.variety = f2.variety
      AND f.price <= f2.price)
    GROUP BY f.type, f.variety, f.price
    HAVING COUNT(*) <= 2
    

    This is untested SQL, but hopefully you get the idea.

    Perrin Harkins

    7 Dec 06 at 4:57 pm

  3. The concept is right, but you got your criterion slightly mixed.

    I knew about this one but somehow didn’t think to include it. Thanks for the reminder. It is equivalent to the correlated subquery method, but without subqueries of course. (it is still O(n^2) though).

    Xaprb

    7 Dec 06 at 5:06 pm

  4. Excellent! This taught me how to solve a problem I was working on.

    Thanks!

    Marc

    29 Dec 06 at 6:24 pm

  5. what about:

    select type,substr(min(concat(lpad(cast((price*100) as char),10,'0'),variety)),11),min(price) from fruits;

    one pass, zero subqueries, the only drawback is that it can’t be used with float type. It can also use a index for the min if you materialize the expression in the min with a trigger and index it.

    Cristiano

    16 Jan 07 at 11:52 am

  6. Cristiano, that’s a novel approach, and a great tip, though you forgot the GROUP BY type that’s necessary to prevent non-deterministic results. For a sparsely grouped table (e.g. thousands of rows per type), adding an extra column and indexing it would probably have diminishing returns, but I can see some situations where it would be a very good idea.

    Thanks for writing in; this is an approach I’d never thought about. Now I have another possible way to solve problems!

    Xaprb

    16 Jan 07 at 2:40 pm

  7. This was unbelievably helpful

    Isaac

    26 Jan 07 at 10:46 pm

  8. Great job. Thanks a lot for your article, very helpful.

    Enrique

    14 Feb 07 at 3:30 am

  9. Thanks for this superb article.

    Raj

    1 Mar 07 at 2:58 am

  10. RE: Select the top N rows from each group

    To get better performance for table with large number of rows, using identity() function and temporary table (if your database has this two feature). The following scripts have been tested in SQL Server 2000.

    create table fruits
    (
      type varchar(10),
      variety varchar(10),
      price decimal(18,2)
    )
    
    insert into fruits values( 'apple'  , 'gala'       ,  2.79)
    insert into fruits values( 'apple'  , 'fuji'       ,  0.24)
    insert into fruits values( 'apple'  , 'limbertwig' ,  2.87)
    insert into fruits values( 'orange' , 'valencia'   ,  3.59)
    insert into fruits values( 'orange' , 'navel'      ,  9.36)
    insert into fruits values( 'pear'   , 'bradford'   ,  6.05)
    insert into fruits values( 'pear'   , 'bartlett'   ,  2.14)
    insert into fruits values( 'cherry' , 'bing'       ,  2.55)
    insert into fruits values( 'cherry' , 'chelan'     ,  6.33)
    
    select type, variety, price, identity(int, 1, 1) as price_rank
    into #fruits
    from fruits
    order by type, price
    
    select f1.type, f1.variety, f1.price
    from #fruits f1,
    (
    	select type, min(price_rank) as min_price_rank
    	from #fruits
    	group by type
    ) f2
    where f1.type = f2.type
      and f1.price_rank < f2.min_price_rank + 2

    Haitao

    12 Mar 07 at 9:18 pm

  11. Thanks Haitao. For those who don’t know, prefixing a table name with # in SQL Server makes it a temporary table.

    Xaprb

    13 Mar 07 at 7:53 am

  12. RE: Median calculation made easy by reusing the ‘Select the top N rows from each group’ query

    Median calculation is one of the nightmares for some database developers (including myself), and it is disappointing that SQL Server 2005 still does not have the function available, still not. The calculation I googled so far are all correlation queries which are far too complex and not practical for tables with millions of records, because of the poor performance. But I just realize that the code I posted above just one step away from the median function, haha gotya…

    create table fruits (type varchar(10), variety varchar(10), price decimal(18,2))
    
    insert into fruits values( 'apple'  , 'gala'       ,  2.79)
    insert into fruits values( 'apple'  , 'fuji'       ,  0.24)
    insert into fruits values( 'apple'  , 'limbertwig' ,  2.87)
    insert into fruits values( 'orange' , 'valencia'   ,  3.59)
    insert into fruits values( 'orange' , 'navel'      ,  9.36)
    insert into fruits values( 'pear'   , 'bradford'   ,  6.05)
    insert into fruits values( 'pear'   , 'bartlett'   ,  2.14)
    insert into fruits values( 'cherry' , 'bing'       ,  2.55)
    insert into fruits values( 'cherry' , 'chelan'     ,  6.33)
    
    select type, variety, price, identity(int, 1, 1) as price_rank
    into #fruits
    from fruits
    order by type, price
    
    select f1.type, avg(f1.price) as median_price
    from #fruits f1,
    (
      select  type,
                min(price_rank) as min_price_rank,
                max(price_rank) as max_price_rank
    from #fruits
    group by type
    ) f2
    where f1.type = f2.type
      and (	   f1.price_rank = (max_price_rank + min_price_rank)     / 2
            or f1.price_rank = (max_price_rank + min_price_rank + 1) / 2)
    group by f1.type

    Basically it is to average the middle two(if even) or one(odd) value in a group.

    Being tested under SQL Server 2000 for a table with 2 million records, it took less than a minute. The part took most of the time is to generate the temporary table, so make sure you only put the necessary fields into the #table.

    Hope this helps.

    Also, cheers to Xaprb for the well-maintained blog.

    Haitao

    13 Mar 07 at 7:30 pm

  13. Hello,

    I have a simple SQL question.

    Here is the output I obtain when I issue the following command :

    SELECT code_age_01, nuitees, COUNT(*) FROM stats_2007_03_15__00_03_29 GROUP BY code_age_01, nuitees;
    
    +-------------+---------+----------+
    | code_age_01 | nuitees | COUNT(*) |
    +-------------+---------+----------+
    | A           |       1 |        3 |
    | A           |       2 |       11 |
    | A           |       3 |        3 |
    | A           |       5 |        5 |
    | A           |       8 |        1 |
    | B           |       1 |        5 |
    ... snipped ...
    | H           |       4 |        1 |
    | H           |       8 |        1 |
    | I           |       0 |        1 |
    | I           |       2 |        2 |
    | I           |       4 |        1 |
    | I           |       6 |        1 |
    | I           |       8 |        1 |
    +-------------+---------+----------+

    The question is as follows :

    Is there a way to reformat the SQL command in such a way that the resulting table would be what I call “a two-way table,” that is :

    - the top row would list all the “nuitees” values,
    - the first column would list all the “code_age_01″ values
    - all the other cells of the table would list the “count(*)” value for the “nuitees” value of the column and the “code_age_01″ value of the row.

    In other words, how can I obtain the following output :

       0  1  2  3  4  5  6  7  8  9
    ==================
    A
    B
    C
    D
    E
    F               values
    G                 for
    H               count(*)
    I

    Thank you,

    Pierre Caron

    Pierre Caron

    15 Mar 07 at 12:53 am

  14. Bonjour Pierre,

    What you need is called a “cross tabulation” or “pivot table.” Giuseppe Maxia wrote a great article about cross tabulations some time ago.

    Xaprb

    15 Mar 07 at 9:04 am

  15. On the topic “Selecting the one maximum row from each group”, how would this look if you got data from two tables, e.g. you split this table up with a column productid to link them together. In one table you have productid, type and variety. In the second table you have productid and price. Is it still possible to get it into one SQL statement?

    RN

    19 Mar 07 at 12:33 pm

  16. Certainly. You can do a subquery in the FROM clause, joining the two tables together in the suquery. Now they appear as one table to the rest of the query (hence the reason this is sometimes called a “derived table,” which isn’t technically incorrect, but it’s a misuse of the term).

    That’s just a one-size-fits-all answer. In any given query there are likely to be many other ways to do it.

    Xaprb

    19 Mar 07 at 1:23 pm

  17. I have a similar problem I am trying to solve, but in my case I need to get only the top 1 per group, but within the group I need the greatest date and ID.

    I was using the following

    select
    	a.CampaignID,
    	a.Email,
    	a.TS
    into #tempLog
    from dbo.xLog a
    inner join (select email,max(TS) as TS from xLog where campaignid = @campaignid group by email) b
    on a.email=b.email and
       a.TS = b.TS
    where campaignid = @campaignid

    This did not work because it turned out I had duplicate records with the same date (TS) per email.

    The xLog table also has an ID (identity) field, but I need to firstly grab the latest timestamp before getting the greatest ID.

    I’m believe this will give me the correct results, although I don’t know how efficient it is.

    select
    	a.CampaignID,
    	a.Email,
    	a.TS
    into #tempLog
    from dbo.xLog a
    inner join (select email,max(cast(TS as varchar(24)) + cast(ID as nvarchar(10)) ) as X from xLog where campaignid = @campaignid group by email) b
    on a.email=b.email and
       cast(a.TS as varchar(24)) + cast(a.ID as nvarchar(10)) = b.X
    where campaignid = @campaignid

    Talahaski

    21 Mar 07 at 12:33 pm

  18. I found the “Selecting the one maximum row from each group” very helpful as well. It helped me write a query that I was finding difficult – joining a specific record from a grouped subquery back to the original table that I was grouping on in the subquery. I could not have done it without the clear and useful examples posted here.

    Byron V

    24 Mar 07 at 4:00 pm

  19. Genius and a life saver

    Carl

    30 Mar 07 at 7:47 pm

  20. This helped me do my task in an elegant way! Thanks for posting this! :)

    trebskie

    30 Mar 07 at 9:48 pm

  21. I was googling for a half hour and couldn’t find what I needed until I got here. Very clear explanation and exactly the code that I needed. Thanks very much!

    David

    3 Apr 07 at 5:49 pm

  22. Hello
    great article :)
    it would be interesting to edit the article and add Cristiano technique :-)

    matt

    16 Apr 07 at 11:47 pm

  23. Those are clear examples when using aggregate functions. I need to get the first record from a group, and the field is a char field. I am using advantage sql and do not have a first() func.

    Specifically, I am linking two tables by a common field, Jobno. I need all
    records from table a and only the first from b with the same jobno.

    table a
    jobno,quantity
    111,22
    112,23
    
    table b
    jobno,billno
    111,aaa
    111,aaa
    111.bbb
    112,ccc
    
    The result I need is
    jobno,billno
    111,aaa
    112,ccc

    Thanks – Ken

    Ken

    18 Apr 07 at 2:25 pm

  24. Ken, I don’t know anything about SQL Advantage, but you should be able to apply the techniques I mention in this article. Don’t make the common mistake of joining and then trying to extract. That is aggregation at the wrong place. Instead, aggregate and get the first record from table b. Now join that result to table a.

    I have never heard of a first() function.

    Xaprb

    18 Apr 07 at 2:53 pm

  25. Edit: Sorry, I didn’t include formatting the first time I posted.

    I’ve been searching high and low across the Internet kingdom for a solution to a problem I have. Your post has probably given the most insight so far but I’m still not there yet.

    I have two tables – category references and posts and I would like to obtain the latest 10 posts for each of the categories with all results ordered in descending date order.

    Category Reference Table

    Stores cat_id to reference a category in a separate table and post_id referencing the post from the table below. This is a referential table so that multiple categories can be assigned to each post.

    +--------+------------+
    | cat_id  | post_id     |
    +--------+------------+
    

    Posts Table

    The important fields are id and date. The date field needs to be used for all ordering.

    +--------+------------+
    | id        | date         |
    +--------+------------+

    I don’t want to have to resort to using separate queries so if the solutions provided could some how be applied to this, that would be excellent.

    Thanks!

    Rez

    20 Apr 07 at 8:31 pm

  26. [...] How to select the first/least/max row per group in SQL – Xaprb (tags: sql programming howto db tips aggregate) [...]

  27. Simply great page.
    It has taken me years to learn the hard way what’s in this page, and I still learned a lot by reading it.
    Thanks.

    Warning, the different methods give the same results because each price is DIFFERENT in each group.
    You have to be very carefull with queries like that if it’s not the case.

    If you want EXACTLY one line, Cristiano’s method is the one you want. It has also the advantage to be deterministic : at equal price, you get the first variety alphabetically. Sometimes what you get is the last alphabetically (if you had searched for the max instead).
    If that is a problem, you can play with binary invertion of part of the string to get the correct order back. If your locale makes the correct order inconsistent with character codes, you can use a fixed lengh column that contains a proper “reverse order” string corresponding to variety and your locale.

    The other methods will give all the lines with the best price. If that’s what you want, choose one of them.

    The top N of each group can be even more confusing :

    The first two examples given by Xaprb are not equivalent :

    the first will give all the fruits per type that is in the top N prices of the type.

    the second will give all the fruits per type that have less than
    N cheaper fruits. In other words, the N cheapest of each type evry fruits of the same price and category of one of these N.

    In both cases, you may get more than N rows per category.

    The UNION solution is the one that gives N maximum lines per category. Adding fruit to the ORDER BY clause allows to get deterministic results, but unfortunatly sometimes at a high performance cost.

    The same can be achieved on SQL server, using
    SELECT TOP N … instead of SELECT … LIMIT N .
    The same “ORDER BY” warnings apply.

    Some implementations allow to use Select TOP N or SELECT … LIMIT N
    in a subquery that refers to colums in the main query.
    When allowed you can have the top N for each group this way.
    When disallowed, error messages are generally cryptic, or sometimes results are false.
    When acceptable, performace wise, the second method, with a prorer order by and a removing of the extra unwanted lines per type at programming level is the best in my experience.

    Matthieu Haller

    17 Jul 07 at 11:51 am

  28. Fantastic, so simple! As I needed exactly one row I used Cristiano’s method (thanks for the tip Matthieu) and converted it to SQL server 2000 syntax, and in case this helps anyone else:

    select type,
    substring((
    min(
    right(replicate(’0′,10) convert(varchar(10),price*100),10) convert(varchar(10),variety)
    ))
    ,10,10 –add maximum possible length for variety here
    )

    ,min(price)
    from fruit
    group by type

    I tested this on my own data structure, then replaced the column and table name for those in the example.

    Michael Caldow

    2 Aug 07 at 12:49 pm

  29. If you dont expect too many values of a single type then it is possible to create a list of values of single type with a GROUP_CONCAT(cID) GROUP BY cType
    1,2,3,…
    4,5,6,…
    7,8,9,…

    select only first N values SUBSTRING_INDEX(cList, ‘,’, 2)
    1,2
    4,5
    7,8

    join with the table again
    1 – 1,2
    1 – 4,5
    1 – 7,8
    2 – 1,2
    ….

    select values that are in list – FIND_IN_SET(CID, CList)
    1 – 1,2
    2 – 1,2
    4 – 4,5
    5 – 4,5
    7 – 7,8
    8 – 7,8

    SELECT cID, … FROM Table
    LEFT JOIN (
    SELECT SUBSTRING_INDEX(cList,’,',10) AS cList FROM (
    SELECT GROUP_CONCAT(cID) AS cList FROM Table GROUP BY cType
    ) AS T1
    ) AS T2
    ON FIND_IN_SET(cID, cList)>0
    WHERE cList IS NOT NULL

    It might looks like a complicated but it is not, and it is MUCH faster than processing data outside SQL, keep on your mind that GROUP_CONCAT can result only

    Artur Filipiak

    7 Aug 07 at 5:58 am

  30. Hi! I have the following problem what i have this query i made from the article of finding duplicate rows but the downside of it is that it won’t put all the duplicates but only one because of the group by, i suppose… what i want is to show all IPs alike and their accounts

    Could you help me in this?

    Here is the query i’m using:
    select username, last_ip from account group by last_ip HAVING count(*) >1

    ——– ———–
    |username|last_ip |
    ——– ———–
    |abc |212.10.10.1|
    |cde |212.10.10.1|
    |ghi |198.12.50.4|
    |ijk |198.12.20.2|
    ——– ———–

    Syanneth

    17 Dec 07 at 10:29 pm

  31. if use GROUP statement,it will be a big problem to ORDER BY some fields which are not unique.

    angel

    4 Jan 08 at 6:41 am

  32. thank you very much.

    martin

    15 Jan 08 at 6:07 am

  33. Thanks! you helped me solve a problem. On Oracle databases only, a good way of limiting the number of rows returned is to use the system var ‘rownum’:
    eg: select *
    from ( select ixa.alert_id , ixa.alert_title, to_char(ixa.alert_date,’mm/dd/yyyy’)
    from iss_xf_alerts ixa
    where ixa.oid = ’5443′)
    order by ixa.alert_date desc) where rownum

    mike

    18 Feb 08 at 12:15 pm

  34. Great – very helpful!

    I’d be interested in the performance implications of these various methods – it’s easy testing on a simple db with a few rows, but how does (e.g. from your ‘Selecting the one maximum row from each group’):

    select f.type, f.variety, f.price
    from (
    select type, min(price) as minprice
    from fruits group by type
    ) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;

    Work with a few million rows?

    Cheers,
    -Dan

    Dan

    24 Mar 08 at 2:57 am

  35. Genius ! this article just solve my half year headache :p

    Oh and dan I just use this methods on hundred thousands of data, and no performance issues so far

    WAWAN

    25 Mar 08 at 3:49 am

  36. Hello Xaprb.
    I’m novice in SQL statements.
    I have one question if is possible
    There is anyone how know how to make a query (using SQL selection) in access from one table.
    (The table is made in MS Access 2003).
    My table is like this:
    - Name of table: database
    - Columns: User_Name, Telephone, Time_call, Call_Duration_minutes.
    I want to see the first 3 longest values for Call_Duration_minutes
    Grouped by User_Name
    I want the results to be something like:
    User_Name Telephone Time_Call Call_Duration_minutes
    Aida 434443343 12:23 15
    Aida 34342234 14:32 14
    Aida 75656544 9:34 11
    Cobal 44554554 19:34 57
    Cobal 56544535 23:34 45
    Cobal 89767554 01:23 5
    Bob 34555432 03:45 125
    Bob 78678676 12:23 21
    Bob 67565444 17:54 11
    - – - – – - – -ETC ETC ETC ETC – - – – - -
    I made some queries but this one is to complicate for me,
    Thank you very much!
    Cheers,
    CRizz

    CRIzz

    8 Apr 08 at 4:41 pm

  37. Great Article!!!

    I just want to point a couple of SQL extensions that I think it could be very helpfull and powerfull and not very difficult to implement.

    Imagine a function called:
    LPOSITION_IN_INDEX( , , ,…)

    This function would return the position that a key would be inserted the key in a index. If the key alredy exists (not unique index) it would return the position just before (Left position).

    Suposing tht the table has the indes:

    ALTER TABLE fruits ADD INDEX XTypePrice(type,price);

    the select would be:

    SELECT type,variety,price
    FROM fruits
    WHERE LPOSITION_IN_INDEX( XTypePrice , type, price)-LPOSITION_IN_INDEX(XTypePrice, type,0)

    Jordi Baylina

    16 Apr 08 at 2:12 pm

  38. Great Article!!!

    I just want to point a couple of SQL extensions that I think it could be very helpfull and powerfull and not very difficult to implement.

    Imagine a function called:
    LPOSITION_IN_INDEX(&ltINDEX_NAME&gt ,&ltKEY_1stPART&gt ,[KEY_2ndPART] ,…)

    This function would return the position that a key would be inserted the key in a index. If the key alredy exists (not unique index) it would return the position just before (Left position).

    Suposing tht the table has the indes:

    ALTER TABLE fruits ADD INDEX XTypePrice(type,price);

    the select would be:

    SELECT type,variety,price
    FROM fruits
    WHERE LPOSITION_IN_INDEX( XTypePrice , type, price)-LPOSITION_IN_INDEX(XTypePrice, type,0) &lt2

    Jordi Baylina

    16 Apr 08 at 6:05 pm

  39. Another extension that I beleve that would be good is to define a function similar to MIN or MAX but with an extra argument. Lets call them MIN2 and MAX2

    the select would be:

    SELECT type,MIN2(price,variety),MIN(price)
    FROM fruits
    GROUP BY type

    The idea is the same that CASANO’s one but more elegant.

    MIN2 and MAX2 could be defined in a mysql UDF very easyly!!

    Jordi Baylina

    16 Apr 08 at 6:06 pm

  40. very helpful
    thanks

    jalil

    24 Apr 08 at 1:53 pm

  41. Isn’t the second method:

    select type, variety, price
    from fruits
    where price = (select min(price) from fruits as f where f.type = fruits.type

    just wrong?

    It more or less assumes that price will be unique. If the same price was repeated for 2 types of fruits and it happend to be the minumum price for one of them, the other fruit would also be returned.
    Am I right?

    Michal Talaga

    3 Jun 08 at 6:12 am

  42. hi, refer to the second-cheapest of that type, the corelated way, i don’t quite understand the count(*) for,
    and another question is, what if i’m seeking for random 2 rows instead of the cheapest 2, i tried removing the price comparison, it doesn’t work. anyone care to enlighten me? thanks

    sswong

    8 Jun 08 at 12:57 pm

  43. Thanks. You are great.
    I always come back to this page :)

    Abdul

    20 Jun 08 at 2:23 am

  44. try this out!

    create a temporary table (by using the #)

    create a cursor to select only the id of each group.

    loop until end of cursor and select top 1 record for each group
    and insert the result to the temporary.

    select all records of temporary and return the data

    that worked for me! following is the store procedure i used…

    CREATE PROCEDURE dbo.sp_GetExamsLastStatus (@FromDate datetime,
    @ToDate datetime)
    /*
    (
    @parameter1 int = 5,
    @parameter2 datatype OUTPUT
    )
    */
    AS
    – Declare the variables to store the values returned by FETCH.
    DECLARE @ExamID int, @StatusDateTime datetime, @Status smallint;
    DECLARE @PrvExamID int;

    – Create temporary table
    CREATE TABLE #ExamStatus(ExamID int, StatusDateTime datetime,Status smallint );

    – create cursor
    DECLARE Exam_Cursor CURSOR FOR
    SELECT examstatus.ExamID, StatusDateTime, Status
    FROM ExamStatus inner join Exam on examstatus.examid=exam.examid
    where exam.examdate between @FromDate and @ToDate
    ORDER BY ExamID, StatusDateTime DESC;

    – open curson
    OPEN Exam_Cursor;

    FETCH NEXT FROM Exam_Cursor
    INTO @ExamID, @StatusDateTime, @Status;
    SET @PrvExamID=0;
    WHILE @@FETCH_STATUS = 0 –loop until end of file
    BEGIN
    if @ExamID@PrvExamID –if record changed
    begin
    SET @PrvExamID=@ExamID –store new ID

    –select the top 1 ordered by date record and
    –save it to the temporary file
    insert into #ExamStatus (ExamID, StatusDateTime, Status)
    SELECT TOP (1) ExamID, StatusDateTime, Status
    FROM ExamStatus
    WHERE (ExamID = @ExamID)
    ORDER BY StatusDateTime DESC;

    END

    FETCH NEXT FROM Exam_Cursor
    INTO @ExamID, @StatusDateTime, @Status;
    END;
    CLOSE Exam_Cursor;

    DEALLOCATE Exam_Cursor;

    –show results
    –select ExamID, StatusDateTime, Status from #ExamStatus ;
    SELECT #ExamStatus.ExamID, Exam.ExamDate, Exam.StartTime, Exam.ExamHours, #ExamStatus.StatusDateTime, #ExamStatus.Status,
    ExamClass.Description AS ClassDescription
    FROM #ExamStatus INNER JOIN
    Exam ON #ExamStatus.ExamID = Exam.ExamID LEFT OUTER JOIN
    ExamClass ON Exam.SiteCenterID = ExamClass.SiteCenterID AND Exam.ClassID = ExamClass.ClassID
    ORDER BY Exam.ExamDate,Exam.StartTime;

    RETURN

    Isidoros

    20 Jun 08 at 4:59 am

  45. This is an extraordinarily useful article, and I’ll share how I used it to pull a mailing list in hopes that perhaps someone else with a similar type of issue might find practical value for the excellent techniques described in the article:

    We rented data from a data broker. In this data, you have a few hundred thousand names, and these names are generally grouped together at buildings (a B-to-B type environment) and have job titles. There are thousands of buildings, meaning that it’s not quite as simple as the fruit example, where there are only about five of them. This list doesn’t get run all the time, so performance is not as much an issue for me as it may be for others – therefore, I used the “elegant” solution listed above (just above USE UNION), as it was the clearest and easiest to modify (even if it does have to make multiple passes with a correlated subquery).

    What I did was:
    -First, rank the job titles numerically to create a hierarchy (I did this on a separate table which gets added to the FROM list in the SELECT statement described in the next step with an INNER JOIN). It’s more important for administrators to get the marketing piece than their peons, so they get a higher ranking. However: one may have the situation where many of the peons have the same job title (and the same rank), meaning that it’s more difficult to pull the top 5 ranks as 1, 2, 3, 4, 5 because you may have one 1 and six or more 4s, for instance.

    -SELECT the entire list into a separate table (can be temporary) that has an int IDENTITY column, sorting this list first by building ID and then by the job title rank. The third sort criteria (unspecified) seems based on the ID of the record which is fairly random (so if there are two fours, the one with the lower random ID number gets selected first) – but this could be specified with even more precision if desired.

    -Then use the “elegant” query to select the top x# of records at each building based on the IDENTITY column, where the lowest identity number is always the highest-ranked job-title-holders, and on down the list until you hit the x# limit.

    This ensures that no more than x# of people per building will get the mailing piece, and that at the very least the top dog(s) will get it and a few others. Other criteria can obviously be added to the sorting if desired based on whatever criteria one would like – whether they’d responded to prior offers, for instance – when it goes into the sub-table to get an IDENTITY value.

    If I want to make sure that a certain group of buildings (say, ones with more than 20 names at the location) get y# (let’s say five) of the pieces but ones with less than 10 get a lower number (x#, say three), then it is easy to SELECT these into two separate IDENTITY-set tables and then select from both of those tables with a UNION clause to get a final list.

    Hope this practical example helps someone just like this article helped me.

    M A Roberts

    23 Jul 08 at 3:03 pm

  46. This article is very useful.. I want to thank you for it..
    I’ve question please …
    How can I select the top 1 news ordered by date and time in each category without using the subqueries ..I was trying to do it with “One-pass technique” you talked about in this article but it didn’t work because the “GROUP BY” statement works before the “ORDER BY” (which I use it to order the news by date and time ) .. but I couldn’t get it ..
    Thank you ..

    AL0NE

    26 Jul 08 at 7:03 pm

  47. The article and the responses have been very helpful. I am trying to solve a somewhat related problem. In terms of the example in this article, What is the best way to query
    1) Sum of min price of all types?
    2) Sum of 2nd highest price of all types?

    Thanks.

    Sunil

    7 Aug 08 at 3:53 pm

  48. Xaprb

    8 Aug 08 at 11:41 am

  49. I wrote a nice comment but didnt see the antispam task! lost everything.
    I was so happy now im not!:-(
    well, thanks for the post anyway, helped me a lot.;-)

    Joana

    15 Aug 08 at 10:54 pm

  50. Hello -
    I need to select a ‘prf’ name type from a names table for all employees but for those who do not have a prf name, I need to get their ‘pri’ name. This seems so close to what I am doing, just cannot make it work.
    thanks,

    Samantha

    12 Sep 08 at 4:17 pm

  51. very much helpfull…
    keep it up.
    thanks a lot

    nikul

    10 Oct 08 at 11:35 am

  52. Hello,

    Hope you can help me. I’m having problems using a derived table instead of the “fruits” table in your example.
    The derived table combines 3 tables:
    Campaign
    Contact Response
    Contact

    that gives me the fields:
    Campaignname
    Campaignid
    Responsedate
    Contactid
    Prefix
    Firstname
    Lastname
    Accountid

    I wish to get this data for the earliest Responsedate per Accountid per campaignname.

    What I’ve come up with so far is:

    SELECT details.CAMPAIGNNAME,
    details.CAMPAIGNID,
    details.RESPONSEDATE,
    details.PREFIX,
    details.FIRSTNAME,
    details.LASTNAME,
    details.CONTACTID,
    details.ACCOUNTID

    FROM

    (SELECT main_table.CAMPAIGNNAME,
    main_table.ACCOUNTID,
    min(main_table.RESPONSEDATE) AS Earliest_Response_Date

    FROM
    (SELECT sysdba.CAMPAIGN.CAMPAIGNNAME,
    sysdba.CONTACTRESPONSE.CAMPAIGNID,
    sysdba.CONTACTRESPONSE.RESPONSEDATE,
    sysdba.CONTACT.PREFIX,
    sysdba.CONTACT.FIRSTNAME,
    sysdba.CONTACT.LASTNAME,
    sysdba.CONTACTRESPONSE.CONTACTID,
    sysdba.CONTACT.ACCOUNTID

    FROM sysdba.CONTACT INNER JOIN
    sysdba.CONTACTRESPONSE ON sysdba.CONTACT.CONTACTID = sysdba.CONTACTRESPONSE.CONTACTID INNER JOIN
    sysdba.CAMPAIGN ON sysdba.CONTACTRESPONSE.CAMPAIGNID = sysdba.CAMPAIGN.CAMPAIGNID

    ) AS main_table

    GROUP BY main_table.CAMPAIGNNAME,
    main_table.ACCOUNTID

    ) AS selecta INNER JOIN main_table AS details on
    details.CAMPAIGNNAME = selecta.CAMPAIGNNAME and
    details.ACCOUNTID = selecta.ACCOUNTID and
    details.RESPONSEDATE = selecta.Earliest_Response_Date;

    ——————————————————
    When I try to run this, I get the error message:

    Msg 208, Level 16, State 1, Line 1
    Invalid object name ‘main_table’.

    Thanks in advance.

    Tim

    21 Oct 08 at 6:29 am

  53. Hello -
    I need to select a ‘prf’ name type from a names table for all employees but for those who do not have a prf name, I need to get their ‘pri’ name. This seems so close to what I am doing, just cannot make it work.
    thanks,

  54. Look at COALESCE.

    Xaprb

    22 Oct 08 at 5:47 pm

  55. Excellent! Thank you for this info.

    WeinerDawg

    11 Nov 08 at 3:42 pm

  56. And another 2 solutions:

    First get the rank of one entry:

    1. Oracle/Transact/SQL-Server Compatible with rank()

    SELECT TYPE,variety,Rank() over (PARTITION BY TYPE ORDER BY variety) FROM fruit

    2. Plain SQL-Compatible (here Oracle-Syntax)
    select a.type, a.variety,count(*) as rank
    from
    (
    select TYPE,variety,price
    from fruit
    group by TYPE,variety,price
    ) a
    inner join
    (
    select TYPE,variety,price
    from fruit
    group by TYPE,variety,price
    ) b
    on a.type = b.type and a.variety <= b.variety
    group by a.type, a.variety, a.price
    order by a.type, a.variety, count(*)

    Afterwards you can select on this rank

    graf

    27 Nov 08 at 8:30 am

  57. Thank you for sharing your knowledge. I’ve been trying to work out a way to get the max time row from GPS data for mulitple vehicles (hundreds of thousands of rows a day). Using ‘Selecting the one maximum row from each group’ I got a workable solution. Now I can port it over to my Arcmap and start getting some meaningful critic of our AVL responses.

    Here’s what I ended up with if anyone is interested. I’ll need to add variables yet, but the basics work.

    select
    t2.callsign
    ,t2.unitstatus
    ,t2.datetimestamp
    ,t2.AssignedIncident
    ,t2.Latitude
    ,t2.Longitude
    ,t2.Speed
    ,t2.Heading
    ,t2.SVCount
    ,t2.Source
    ,t2.AgeOfData
    From
    (select callsign, max(DateTimeStamp)as maxtime
    from dbo.HISTORY_200812
    where CallSign like ‘LALA%’and DateTimeStamp <= ’2008-12-04 22:13:00′
    group by CallSign ) as t1
    inner join
    dbo.HISTORY_200812 as t2 on t2.callsign=t1.callsign and t2.datetimestamp=t1.maxtime
    order by
    t2.callsign

    Thom W

    12 Dec 08 at 7:45 pm

  58. Please Help
    I have the same problem but with two tables.
    Table 1. is called products
    Table 2 is called ProductVariations.

    Both have a field called ProductID (this is the primary key for the products table.

    The variations table has the price.
    I want a simple query that will Get me only one product variation per product. That being the lowest price.

    Please help.

    I dont know if i should cut an paste the SQL code to generate these tables????

    frank barbito

    20 Dec 08 at 12:07 pm

  59. what if i wanted the min and max amount in one row?

    juna

    12 Jan 09 at 4:53 am

  60. Thanks for this very clearly written and helpful post.

    Steve

    17 Jan 09 at 3:29 pm

  61. [...] How to select the first/least/max row per group in SQL at Xaprb [...]

  62. THANKS A LOT. THE ARTICLE WAS REALLY HELPFUL.

    AMITABH

    9 Mar 09 at 3:50 pm

  63. this was really helpful…………………….

    Nirmo

    17 Mar 09 at 5:25 am

  64. Thank you so much for this article.

    I can’t get the method with user variables to work correctly (I suspect there’s aliasing going on with the multiple JOINs and ORDER BYs I need to do), and I need to ORDER BY floating-point values so I can’t use Cristiano’s method. Instead, I derived a different method:

    SELECT type,
    SUBSTRING_INDEX(GROUP_CONCAT(variety ORDER BY price), ‘,’, 1) AS cheapest_variety,
    SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price), ‘,’, 1) AS cheapest_price
    FROM fruits
    GROUP BY type;

    GROUP_CONCAT() concatenates all members of the group, in the specified order, while SUBSTRING_INDEX() pulls out the element of interest.

    I’m not sure if the above example is doing an extra ORDER BY (so as to extract the price of the cheapest variety), but in my case, I’m happy enough to get a unique identifier for the row (so I only need a single ORDER BY). I guess if you want values from the entire row returned without multiple ORDER BYs, you could CONCAT_WS() the fields with a different separator than is used by GROUP_CONCAT, and then parse the separate fields out later.

    Paul Price

    24 Mar 09 at 3:04 pm

  65. Great website, great article. VERY HELPFUL SOLUTIONS and theory!

    Michael B

    24 Mar 09 at 5:51 pm

  66. I wish to know how I can find A – B (The data in A but not in B) using JOIN.

    table_A

    name
    —-
    A
    B
    C

    table_B

    name
    —-
    B
    D
    E

    Thanks in advance

    ABRAHAM

    5 Apr 09 at 3:52 pm

  67. ABRAHAM,you can do it by left join.

    SELECT a.name FROM a LEFT JOIN b USING(name)
    WHERE b.name IS NULL

    angel

    6 Apr 09 at 3:04 am

  68. I want to know how the latest records updated for a column in a table can be retrieved from joining 3 tables together.
    eg :
    Table 1
    A1 pk1
    B1 pk2
    C1 pk3

    Table 2
    N1 date1 2_pk1
    N2 date2 2_pk2
    N3 date3 2_pk3
    N4 date1 2_pk4
    N5 date2 2_pk5
    Table 3
    pk1 2_pk1
    pk2 2_pk2
    pk3 2_pk3
    pk4 2_pk4

    Now in the above 3 tables I want fetch ‘date1′ of table 2 for each of A1 B1 and C1 in table1.I need to join the third table as wel.

    Please post your suggestions asap.Thanks in advance.

    Vish

    8 Apr 09 at 8:33 am

  69. We are creating a concerts web site. Each concert has various show times (1-to-many). When viewing a concert summary, we want to show From min date to max date, like May 29 – May 30 Still need to figure out how to hide the second month string if all show times are in the same month.

    Here’s a query for mysql that’s based on the simple self-join in this great tuturial. Sorry I don’t have time to simplify the field names to make it clearer.

    Hope it helps a passer-by like this tutorial helped me.

    select n.nid, n.title, x.minDate, y.maxDate from node n inner join
    (select node.nid, min(cte.field_datestamp_value) as minDate from node
    inner join (og_ancestry oga inner join content_type_event cte on oga.nid = cte.nid) on node.nid = oga.group_nid
    where (DATE_FORMAT(ADDTIME(FROM_UNIXTIME(cte.field_datestamp_value), SEC_TO_TIME(-25200)), ‘%Y-%m-%d’) >= ’2009-06-27′)
    group by node.nid) as x on n.nid = x.nid
    inner join
    (select node.nid, max(cte.field_datestamp_value) as maxDate from node
    inner join (og_ancestry oga inner join content_type_event cte on oga.nid = cte.nid) on node.nid = oga.group_nid
    where (DATE_FORMAT(ADDTIME(FROM_UNIXTIME(cte.field_datestamp_value), SEC_TO_TIME(-25200)), ‘%Y-%m-%d’) >= ’2009-06-27′)
    group by node.nid) as y on y.nid = n.nid

    Darko

    17 Apr 09 at 1:53 am

  70. Hello All,

    I have 2 tables.

    1) names

    id | name
    —-+———
    1 | A
    1 | BB
    1 | CCC
    2 | AAA-BBB
    5 | FIVE

    and b) criteria

    id | min | max
    —-+—–+—–
    1 | 1 | 2
    2 | 1 | 1

    I wish to get all rows from “names” which satisfy the min, max condition from “criteria” and rows with id’s resent in “names” but not in “criteria”

    id | name
    —-+———
    1 | A
    1 | B
    5 | FIVE

    I believe it should be a LEFT JOIN …not 100% sure.

    Thanks & Regards
    Roxana

    Roxana ABRAHAM

    4 May 09 at 6:00 pm

  71. Is that wat you need?

    select N.Id,N.Name
    from names N
    left outer join criteria C on N.Id=C.Id
    where (C.id is NULL) or
    ( (length(name)>=C.min) and
    (length(name)<=C.max)
    )

    Jordi Baylina

    5 May 09 at 3:04 am

  72. This is great, it helped me figure out how to write my query….although once I saw your second solution, I realized I had known all along, but just had brain fog. Nevertheless, I appreciate being able to find clever solutions to this problem!

    Put more information about fancy SQL out there, I love it!

    Maggi Tumulty

    9 Jun 09 at 9:04 am

  73. Very interesting. Thanks for putting this online.

    nir levy

    15 Jun 09 at 2:54 am

  74. Below are 3 fields and I want to group rows ,condition as follows.

    Show row for latest date (for field B )
    by latest date I mean 06/28/2002 >01/14/2001 , 06/28/2002 is latest date.
    If
    there are two latest date
    then
    take field C and find out latest date rows in C

    See below for more explanation.

    Table(TB ) has following fields( just giving fields in question):
    Code:

    A B C
    2 06/28/2002 01/14/2001
    2 11/21/1997 04/25/1995
    2 09/19/2006 12/27/2002 <— B has latest date ,so we show this row

    3 01/23/2001 11/22/1999
    3 02/19/2007 10/23/2002 <— B has latest date , so show this row
    3 17/06/2000 09/14/2002

    4 10/25/2002 06/22/2001
    4 10/25/2002 04/24/1980

    04/24/2001, so show this row.

    4 04/25/2002 05/12/1981
    4 03/13/1999 02/11/1985

    5 06/06/1998 05/05/1995

    The output should be :

    A B C
    2 09/19/2006 12/27/2002
    3 02/19/2007 10/23/2002
    4 10/25/2002 06/22/2001
    5 06/06/1998 05/05/1995

    The query for just finding latest of B is as followed
    Code:

    SELECT A, MAX(B),C
    FROM TB
    GROUP BY A
    WITH UR;

    I want to extend it to above requirement.

    Tokci

    17 Jun 09 at 4:46 am

  75. Somehow att the formatting is gone..
    Some more details which missed out is as followed

    a b c
    4 10/25/2002 06/22/2001
    4 10/25/2002 04/24/1980

    04/24/2001, so show this row.

    Tokci

    17 Jun 09 at 4:49 am

  76. Can anybody help me how to force an incex on this SQL so I can limit per group:

    SET @num := 0, @kategori := ”;
    SELECT
    kategori.id AS kat_id,
    kategori.catname AS kat_navn,
    kategori.image AS kat_billede,
    kategori.publishedcat AS kat_publiseret,
    kategori.ordering AS kat_order,
    begivenhed.id AS b_id,
    begivenhed.catsid AS b_katid,
    begivenhed.dates AS dato,
    begivenhed.times AS start,
    begivenhed.titel AS titel,
    @num := if(@kategori = kategori.catname, @num + 1, 1) as row_number,
    @kategori := kategori.catname as dummy
    FROM jos_eventlist_categories AS kategori INNER JOIN jos_eventlist_dates AS begivenhed ON kategori.id = begivenhed.catsid
    WHERE kategori.publishedcat = 1
    GROUP BY kat_id, b_id
    HAVING row_number <= 5

    Thanks

    Daniel

    5 Jul 09 at 8:22 am

  77. Wow, big thanks!! Had a query that was using sub select to pull one max row and running at 55 seconds, broguht it down to 4 seconds with the join laid out above (brought it down to 0.8 with the addition of an index)

    Any idea if this can be accomplished in hql without the sub select?

    Trails

    16 Jul 09 at 1:21 pm

  78. Awesome SQL example, thank you for posting.
    How could this be taken a step further and list only the fruits which are in stock? Note…some types may have the same price and also the highest priced item may not be in-stock. Thanks

    | type | variety | price | in_stock |
    +——–+————+——-+———-+
    | apple | gala | 1.99 | yes
    | apple | fuji | 2.75 | yes
    | apple | limbertwig | 2.75 | yes
    | apple | delicious | 3.77 | No
    | orange | valencia | 3.59 | yes
    | orange | navel | 9.36 | yes
    | pear | bradford | 6.05 | yes
    | pear | bartlett | 2.14 | yes
    | pear | comice | 2.14 | No
    | cherry | bing | 2.55 | yes
    | cherry | chelan | 6.33 | yes

    Results to be:
    | type | variety | price | in_stock |
    +——–+————+——-+———-+
    | apple | fuji | 2.75 | yes
    | apple | limbertwig | 2.75 | yes
    | orange | valencia | 9.36 | yes
    | pear | bartlett | 6.05 | yes
    | cherry | chelan | 6.33 | yes
    +——–+————+——-+———-

    Chad

    30 Jul 09 at 3:16 pm

  79. I think it should be a matter of adding a simple WHERE constraint.

    SELECT type,
    SUBSTRING_INDEX(GROUP_CONCAT(variety ORDER BY price), ‘,’, 1) AS cheapest_variety,
    SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price), ‘,’, 1) AS cheapest_price
    FROM fruits
    WHERE in_stock = ‘yes’
    GROUP BY type;

    Paul Price

    30 Jul 09 at 3:43 pm

  80. Thanks. I’ll give it a try.

    Chad

    30 Jul 09 at 4:36 pm

  81. –To select the last/First row per group using mysql–
    Try this..
    It works well i hope

    SELECT * FROM T1 R WHERE R.ID IN (SELECT MAX(U.ID) FROM T1 U GROUP BY Name)

    Manikandan

    31 Jul 09 at 5:57 am

  82. Very nice article. Stating the approach up front is tremendously helpful; i.e., “[a]ll involve two steps: finding the desired value of price, and then selecting the rest of the row based on that.”

    So many discussions or writings about SQL show solutions to problems, but don’t present a stepwise approach on how to think through them. Thank you.

    Andrew Brown

    7 Aug 09 at 4:59 pm

  83. i have table of admiion list of 1000 candidates in order of merit of their percentage of marks in matriculation.
    the candidates have given choices of disciplines in order of their merit, which has been entered as ch1,ch2,ch3…..in diferent calumns
    i have to assign dscipline wise 40 seats in order of merit of marks, afterselecting 40 ineach disciplin, the next below candidate has to be offered second choice up to 40, and so on
    can u help me writing such query program fromentered list

    aslam turk

    21 Oct 09 at 2:23 am

  84. thanks,
    It helped me to query a max date from rows grouped.

    Arokia Raj S

    19 Nov 09 at 3:35 am

  85. great!!!

    ioooxz

    19 Nov 09 at 5:31 am

  86. I have table as

    Id Name Date Time
    1 S 1-Dec-2009 9:00

    2 N 1-Dec-2009 10:00

    1 S 1-Dec-2009 10:30

    1 S 1-Dec-2009 11:00

    2 N 1-Dec-2009 11:10
    Need query to display as

    Id Name Date Time

    1 S 1-Dec-2009 9:00

    1 S 1-Dec-2009 11:00

    2 N 1-Dec-2009 10:00

    2 N 1-Dec-2009 11:10
    My backend database is MS Access and using VB6 for Max and Min time

    shanks

    14 Dec 09 at 4:26 am

  87. thank you so much – very elegant and I can leave now to go drinking on Friday evening. Thanks.

    Darren

    22 Jan 10 at 5:44 pm

  88. Three years old and still as good as gold! Thank you!

    Ahned

    24 Jan 10 at 2:02 am

  89. Please send how to code to find the highest record from a field while the number is added with any string.
    for example “BIS001,BIS002,BIS009,BIS010,BIS003″…i need ans hear is BIS010

    Murugaveni.K

    29 Jan 10 at 2:27 am

  90. Please send a million dollars :)

    Xaprb

    29 Jan 10 at 9:29 am

  91. Micosoft SQL Server 2008 and maybe 2005 has a CROSS APPLY that can be used like this.

    SELECT distinct f3.*
    from fruits f1
    CROSS APPLY (
    SELECT TOP(2) f2.*
    from fruits f2
    where f1.type = f2.type
    ORDER BY f2.price
    ) f3

    x2

    3 Feb 10 at 5:05 pm

  92. Thank you very much!!!
    Ive been googling for more than two hours to find a solution to my SQL prob. Great Article!!! Timeless!!!

    badongen

    11 Feb 10 at 4:11 pm

  93. what about:

    create table Fruits
    (
    type varchar(10),
    variety varchar(10),
    price decimal(18,2)
    )

    insert into Fruits values( ‘apple’ , ‘gala’ , 2.79)
    insert into Fruits values( ‘apple’ , ‘fuji’ , 0.24)
    insert into Fruits values( ‘apple’ , ‘limbertwig’ , 2.87)
    insert into Fruits values( ‘orange’ , ‘valencia’ , 3.59)
    insert into Fruits values( ‘orange’ , ‘navel’ , 9.36)
    insert into Fruits values( ‘pear’ , ‘bradford’ , 6.05)
    insert into Fruits values( ‘pear’ , ‘bartlett’ , 2.14)
    insert into Fruits values( ‘cherry’ , ‘bing’ , 2.55)
    insert into Fruits values( ‘cherry’ , ‘chelan’ , 6.33)

    SELECT *, row_number() OVER (PARTITION BY type ORDER BY price) PriceOrder
    FROM Fruits

    SELECT *
    FROM (
    SELECT *, row_number() OVER (PARTITION BY type ORDER BY price) PriceOrder
    FROM Fruits
    ) F
    WHERE F.PriceOrder=1

    trunkate

    24 Feb 10 at 6:37 am

  94. [...] more advanced usage, I suggest you read Baron Schwartz's article on selecting the first/least/max row per group in SQL Share this [...]

  95. Reading through this great post and all the comments i could to find something that matches my dilemma. I have a daily dump of stock data that I need a single reporting query for to find all the trending data AND the current price AND the starting price. The trending data works great in my GROUP_CONCAT below making a URL ready array.

    The challenge is also getting the first and last out by date into columns I can also sort on, etc. The aggregating functions MIN,MAX,AVG,etc. don’t see this.

    SELECT …,
    AVG(price) `AVG price`,
    MIN(price) `MIN price`,
    MAX(price) `MAX price`,
    curr_price `CURR price`,
    GROUP_CONCAT(CONCAT(‘g1[',DATE_FORMAT(h.creation_stamp,'%b%d'),']=’,price) order by h.creation_stamp ASC SEPARATOR ‘&’ ) `Holdings Graph`

    FROM
    holditrack h
    WHERE
    ptype = [ptype]
    AND
    creation_stamp BETWEEN [startdate] AND [enddate]
    GROUP BY cusip ORDER by descr ASC

    Drew

    16 Mar 10 at 6:16 pm

  96. I think i found my own solution by doing this based on the LEFT JOIN solution above….

    SELECT h.ptype Product, h.symbol SYM, h.secnum SecNo, h.CUSIP, h.descr Name, h.Accts, h.sharequan Shares, h.marketval `Market Val`, CONCAT(count(DISTINCT h.hid), ‘ ‘, DATE_FORMAT(MAX(h.creation_stamp),’%b%d’)) `Postings Latest` , AVG(h.sharequan) `AVG Share Quan`, AVG(h.price) `AVG price`, MIN(h.price) `MIN price`, MAX(h.price) `MAX price`, h.curr_price `CURR price`, GROUP_CONCAT(DISTINCT DATE_FORMAT(h2.creation_stamp,’ %b%d’)), h2.price firstPrice, GROUP_CONCAT(DISTINCT h2.price) allH2Price, GROUP_CONCAT(DISTINCT CONCAT(‘g1[',DATE_FORMAT(h.creation_stamp,'%b%d'),']=’,h.price) ORDER BY h.creation_stamp ASC SEPARATOR ‘&’ ) `Holdings Grsaph` FROM holditrack h LEFT JOIN holditrack h2 ON h.cusip = h2.cusip AND h2.creation_stamp >= ’2010-02-10′ WHERE h.ptype = ‘Equities’ AND h.creation_stamp BETWEEN ’2010-02-10′ AND ’2010-03-31′ GROUP BY h.cusip ORDER BY h.descr ASC [ 9.3800749778748 ]

    I definitely get all that I am looking for as my little test columns confirm, but the left join turns a half second response into a 9 second response with just a months worth of data (100k rows in table)!

    Any help on optimizing the query is appreciated.
    (indexes are on creation_stamp, ptype, and cusip)

    explain shows nothing too taxing…

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE h ref ptype,creation_stamp ptype 57 const 11632 Using where; Using temporary; Using filesort
    1 SIMPLE h2 ref cusip,creation_stamp cusip 57 rrmanatrack_db.h.cusip 23

    Drew

    17 Mar 10 at 2:17 pm

  97. Thank you so much! I am learning this next level of mysql and you have written something I can understand very well for the specific problem I was trying to overcome. Thank You!

    cliffclof

    19 Mar 10 at 1:23 am

  98. [...] How to select the first/least/max row per group in SQL at Xaprb (tags: mysql tips) [...]

  99. dude..here you go…

    Kumar

    26 Mar 10 at 2:15 pm

  100. [...] This site contains more suggestions on how to perform a “Top N Per Group” query , and eventually the one I based my query on Xaprb [...]

  101. I simplified somewhat the first solution:

    the original solution was:
    select f.type, f.variety, f.price
    from (
    select type, min(price) as minprice
    from fruits group by type
    ) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;

    my version is:
    select type, variety, price from (select type, min(price) as price from fruits group by type) as x natural join fruits;

    this works in mysql, seems simpler, I also don’t have to use the “f.” prefixes. My question is: is it any faster/slower than the original version?

    rty42

    11 May 10 at 6:44 am

  102. It’s essentially the same query, except that you’re letting the engine handle the JOIN constraints for you through using a NATURAL JOIN.

    Paul Price

    11 May 10 at 3:21 pm

  103. I hoped for it :)
    Thanks

    rty42

    12 May 10 at 6:39 am

  104. Really nice explnation.

    DD

    13 May 10 at 6:04 pm

  105. In SQL Server I usually join. This also works if more apples hava a similar price:

    SELECT
    type
    , variety
    , price
    FROM
    fruit AS f1
    WHERE EXISTS
    (
    SELECT TOP 1
    “With exits it does not matter what”
    FROM
    fruit f2
    WHERE /* only match id here */
    f1.type = f2.type
    AND f1.variety = f2.variety
    AND f1.price = f2.price
    ORDER BY
    f2.price DESC
    , f2.variety
    )

    Bart Kemps

    18 May 10 at 9:12 am

  106. In SQL Server I usually join. This also works if more apples hava a similar price:

    SELECT
    ““type
    ,“`variety
    ,“`price
    FROM
    ““fruit AS f1
    WHERE EXISTS
    ““(
    ““““SELECT TOP 1
    ““““““”With exits it does not matter what”
    ““““FROM
    ““““““fruit f2
    ““““WHERE /* only match id here */
    ““““““f1.type = f2.type
    ““““AND`f1.variety = f2.variety
    ““““AND`f1.price = f2.price
    ““““““ORDER BY
    ““““““““f2.price DESC
    ““““““,“`f2.variety
    ““)

    Bart Kemps

    18 May 10 at 9:15 am

  107. This is a great set of examples. I’m wondering if someone can help with a variant issue. lets say that instead of price I had a line called “destination” and I had many rows for the same product to the same customer but I wanted to create a table of products with just the FIRST record for each product based on sorting the table first on product and then on destination and then I wanted to get some additional information from jsut the first row for each product (perhaps the telephone number). How do I do this?

    When I do the obvious I fidn out the hard way that FIRST is the first record that matches, and not the first after a complex set of Ordered By. so I need to sort the query first (nested) and then use the sorted result query to do a group by and get the first record. Then I need to take the unique ID for this ron and use it to get other values on the record. It would be great if I had a column with a unqiue number within the group of records, but I don’t. Any ideas?

    Savin

    2 Jun 10 at 2:31 pm

  108. This is a wonderful article…

  109. Matthieu Haller noted that problems could arise in your example in cases where more than one fruit within a category had the minimum price. In many cases, you will want ALL the cheapest fruits within the category to be returned. But for my application, this behaviour would have been disastrous, so I added an aggregate function Min() to the fruits.variety to select one variety only for each type. Also, I wanted to select a completely different field from fruit’s properties, so I do something similar to your first solution, but instead of self-joining two copies of the same table, I self-join three.

    (SELECT cheapest_fruits_limitone_by_type_qry.type, cheapest_fruits_limitone_by_type_qry.fruit_field_with_other_details FROM
    (SELECT cheapest_fruits_by_type.type, Min(selected_fruits_by_type.variety) AS selected_variety FROM
    (SELECT fruits.type, Min(fruits.price) AS cheapest_price_by_type
    FROM fruits
    GROUP BY fruits.type)
    AS cheapest_fruits_by_type
    INNER JOIN fruits AS selected_fruits_by_type
    ON (cheapest_fruits_by_type.type=selected_fruits_by_type.type AND cheapest_fruits_by_type.cheapest_price_by_type=selected_fruits_by_type.price)
    GROUP BY cheapest_fruits_by_type.type)
    AS selected_fruits_by_type_qry
    INNER JOIN fruits AS cheapest_fruits_limitone_by_type_qry
    ON (cheapest_fruits_limitone_by_type_qry.variety=selected_fruits_by_type_qry.selected_variety))

    Limiting the selected rows strictly to ONE per category, means that when I do complex INNER JOINs based on this as a subquery, I don’t spawn a load of extra rows (as in a Cartesian-join) just because two fruits in the same category have the same price.

    Matthew Slyman

    29 Jun 10 at 9:33 am

  110. Help me

    I have 2 table

    1)Debitur
    __________________
    DIN | NAME
    —————–
    001 | AA
    001 | BB
    —————–

    2)Trans
    __________________
    DIN | NILAI
    —————–
    001 | 9500
    001 | 7500
    001 | 8000
    —————-

    Results to be:
    ____________________________
    DIN | NAME | NILAI
    —————————-
    001 | AA | 9500
    001 | BB | 7500
    001 | | 8000
    —————————

    ridwan

    14 Jul 10 at 10:36 am

  111. Just wanted to express my appreciation…thanks! I’m learning a lot!

    Matt l

    4 Aug 10 at 11:10 am

  112. Hi I have another problem which is not included in your examples. How about selecting the nth row from each group?

    Geocine

    5 Aug 10 at 4:07 am

  113. To Geocine: Try using LIMITs on returned row-counts, in conjunction with ORDERing and subqueries. IN other words, to select the 2nd row in each group,
    1. Construct a query for grouping your values appropriately.
    2. Use this as a subquery for a query that selects the TOP 2 rows from each group, ordered in ASCending order of whatever property you are using to select the nth row.
    3. Use this as a subquery for a query that ORDERs the rows in the opposite direction, and selects the TOP 1 row.

    You might need more than three levels of querying/ subquerying. Not sure. But this strategy should work, if my limited experience is anything to go by.

    Matthew Slyman

    5 Aug 10 at 5:45 am

  114. Has been a great help, many thanks

    k

    23 Aug 10 at 9:07 am

  115. This was a great help…

    Z

    27 Aug 10 at 3:48 pm

  116. A great help,many thanks…suggest some more articles related to sql queries example.Thanks once again

    sandeep

    27 Aug 10 at 11:06 pm

  117. Hey,

    Had the same problem to get this Top N (Top 3 Comments to n articles for me) Select fast, I decided to create a new seperate Index Table.

    Table: t1_index

    ArticleID | Comment1ID | Comment2ID | Comment3ID

    With CommentXID as the top 3 items.

    The speed increasment was tremendous: Database with 101k entrys, 0.0048 seconds instead of 4 seconds before which is around 1000 Times faster

    Patrick

    31 Aug 10 at 7:36 pm

  118. This article provided a lot of assistance, thank you.

    Tyler

    9 Sep 10 at 1:39 pm

  119. Grate job man. It helped me a lot. Thanks

    ooty

    14 Sep 10 at 8:21 am

  120. Thank you, its a great article, i’ve learned something new.

    Alex Serrano

    15 Sep 10 at 4:34 pm

  121. How about this, for selecting only the latest review for each application from a table containing many reviews about many applications:-

    select r1.reviewid,r1.applicationid,r1.reviewer,r1.dateadded
    from topmobileapps_applicationreview_data r1 LEFT JOIN topmobileapps_applicationreview_data r2 ON (r1.applicationid=r2.applicationid and r1.dateadded<r2.dateadded) where r2.applicationid IS NULL order by applicationid;

    the left join between the review table and itself, where each record in the right version is newer (less) than that on the left, means that the newest record for each (applicationid) on the left will always be unmatched. Filter out that, and you have the result.

    Simon

    22 Sep 10 at 6:02 am

  122. Simon,

    This would be much less efficient than some of the other methods described in the article – since your SQL system (unless extremely well optimised with symbolic SQL manipulation) would need to:

    1. Run a query that generates a number of records proportional to the _SQUARE_ of the number of records in your underlying table (since each record will be matched with EACH of the records having earlier [dateadded];
    2. Filter that as a subquery.

    That might work for a few thousand records but once your database grows to a few million, your users are surely going to be shouting at you about how their database is getting slower. Have a look at some of the other suggestions on the page.

    Matthew

    Matthew Slyman

    22 Sep 10 at 8:10 am

  123. Matthew, sure if every row joined to every other row, but that can’t happen, at most every reviews can join to every other review, for a single application. In my code, I’ve got a table with 100,000s applications and potentially 100s for each application. So at most it’d be n^2 for each application (where n is the number of reviews).

    In my test data (a sample of the real data), I’ve got roughly 10000 reviews with a maximum of 14 reviews on a single application. Using MySQL, the query comes back in less than 0.2 of a second.

    I tried the same thing, grouping by the various fields on the left hand side, as in your example, but it took 0.4 seconds.

    As an aside, explaining the query shows it only considered the same number of records as there are in the review table, so I’m guessing MySQL managed to optimise away the problem you’re suggesting (or I would have expected at least 1 row more than that). Thinking about it, filtering the right side of a left join on IS NULL, isn’t beyond science for the query parser to optimise, so it’s probably done just that.

    Simon

    22 Sep 10 at 12:09 pm

  124. Good clarification; and I can’t argue with actual profiling results!

    Matthew Slyman

    22 Sep 10 at 12:23 pm

  125. xaprb, nice article.

    it would be nice if you would answer my question, pendejo.

    BAAA

    24 Sep 10 at 2:38 pm

  126. Thanks very much. This article was ver helpful

    Sushil

    6 Oct 10 at 9:19 am

  127. The complex made simple …fruits!
    My pupils and I are very grateful to this article.

    jose sanchez rubio

    24 Nov 10 at 4:04 pm

  128. Sir,
    I tried to use this limit function exactly as given above in MS Access so that I can learn by example. But it gives a message of syntax error. Will i have to change the syntax in MS access. I think these syntax are meant for MySQL only. Can you please suggest as to how to write syntax for MSACCESS?

    SREEDHARAN

    26 Dec 10 at 9:19 pm

  129. @SREEDHARAN: Matthieu Haller wrote on this thread,

    “The same can be achieved on SQL server, using
    SELECT TOP N … instead of SELECT … LIMIT N .
    The same “ORDER BY” warnings apply.”

    I believe the same syntax is compatible with Microsoft Access (both in SQL Server passthrough queries, and in native Access JET queries.)

    Matthew Slyman

    27 Dec 10 at 4:25 am

  130. THANK YOU!
    Great job, man!
    THANK YOU!!
    Very helpful!
    THANK YOU!!
    Saved me a lot of time!
    THANK YOU!!!

    Daniele N.

    9 Jan 11 at 3:35 am

  131. You’re all legends – thanks for the help!

    John

    11 Feb 11 at 8:00 am

  132. Excellent example. Thanks for this.

    andy neill

    8 Mar 11 at 10:49 am

  133. Applying what you espoused here made my SQL query run 20x faster (4 seconds vs. 1/5 of a second). Many thanks.

    psmail007

    27 Mar 11 at 7:18 am

  134. :) This article is really going the distance, helped me a ton – thanks.

    kevin

    5 Apr 11 at 12:29 pm

  135. Thank you! Brilliant article and exactly what I am looking for. You took my SQL game up a big notch.

    Mike Taylor

    7 Apr 11 at 8:13 pm

  136. Great article!

    A proposed very easy/fast solution:

    Sort the table by price or date or whatever you want the min or max rows from for that group, remember to still group (see below).

    alter table fruits group by type order by price [asc or desc];
    (order it desc if you want max asc for min)

    then…

    select type, variety, max(price)
    from fruits
    group by type;

    Why this works:

    MySQL will take the extra data from the first row available for that ‘type’ your grouping on as well as the correct min or max values from the aggregation.

    This is probably the fastest solution for big tables… 5 million rows in 139.34 sec, no indexes, no fancy hardware. Go ahead and keep the table sorted and index it perhaps if u need repetition.

    Also try this in mysql 5+, may work without any sort, not sure why this seems to work… think you should only get 1 record back, but it does work…

    select type, variety, price
    from fruits
    group by type having max(price);

    Thoughts anyone?

    Crob

    14 Apr 11 at 5:03 pm

  137. Tks allot for this post it really helped me out (i think i’m sql rusty :), because its college knowledge, at least it should) cheers to you!!

    Luis Mesquita

    17 Apr 11 at 12:38 am

  138. Hi,
    beautiful infos. I’ve also seen the comments but I still can’t solve my problem: if I want not the chipest but the higher priced fruits (2 for each type), I can’t solve, because they are also ordered starting from the chipest.
    I also tried to reorder the hole table in descending mode but the result is also the same.
    Someone with any idea?
    Thanks

    asteddy

    27 Apr 11 at 9:12 am

  139. you can try this way to select extreme one record per group:
    select *
    (select * from fruits order by variety,type) as t group by t.variety

    chester wang

    11 May 11 at 6:03 am

  140. hi, i need help to select n bottom row group by date from my table:

    table Trans:
    Transno DateTrans
    001 01/03/2011
    002 01/03/2011
    003 01/03/2011
    004 01/03/2011
    001 01/04/2011
    002 01/04/2011
    003 01/04/2011
    001 01/05/2011
    002 01/05/2011
    003 01/05/2011
    004 01/05/2011
    005 01/05/2011

    I need to get output like this

    003 01/03/2011
    004 01/03/2011
    002 01/04/2011
    003 01/04/2011
    004 01/05/2011
    005 01/05/2011

    so I need to Get n bottom rows only depends on parameter i gave group by datetrans, for the output above for example 2 bottom rows for each date.
    Anyone can help me.

    Thanks in advance

    nick ang

    26 May 11 at 4:44 am

  141. I have two tables one is master and second is details as per below :
    Table Name : DeptMaster
    MasID Name
    101 HR
    102 FI

    Table Name : DeptDetails
    DetID Details MasID
    1 abc 101
    2 xyz 101
    3 pqr 101

    How to get top 1 record from DeptDetails table by considering inner join with DeptMaster
    Result Require :
    MasID Name Details DetID
    101 HR abc 1

    Please provide query to get the above result.

    Nitin

    29 Jun 11 at 7:59 am

  142. Thanks your article help me to solve my problem. This was really helpful

    Lakshmi

    11 Jul 11 at 3:47 pm

  143. Great article!!! Thanks a lot!!!
    It was a wonderful help to solve the problem I had.

    Thanks very much!!

    David

    13 Jul 11 at 7:00 am

  144. There is also an interesting query, similar to the inner select, which might be more optimized (to be tested!). But in my opinion, it is clearer to understand. It uses the NOT EXIST syntax instead. Like so:

    select type, variety, price
    from fruits
    where NOT EXIST (
    select * from fruits as f
    where f.type = fruits.type and f.price < fruits.price
    );

    Basically, you're telling the DB to find all the fruits where there is not other fruit with the same name but with a lower price. This should reduce the load a bit on the optimizer since the count is removed… I hope!

    Bernard

    3 Aug 11 at 11:03 pm

  145. I honestly don’t see why people keep complaining about SQL when it’s actually MySQL that’s holding them back.

    There are better systems available which implement newer features from the SQL standard that can do this kind of thing much more easily.

    Here’s the last query of your blog post (“cheapest 2 products of each type”) in standard SQL, using a window function (and a common table expression for improving readability):

    WITH ranked_fruits AS (
    SELECT type, variety, price,
    rank() OVER (PARTITION BY type
    ORDER BY price ASC) AS rank
    FROM fruits
    )
    SELECT type, variety, price
    FROM ranked_fruits
    WHERE rank <= 2
    ORDER BY type, price ASC;

    I've tested this and it works in PostgreSQL. I believe most of the proprietary databases (DB2, MSSQL, Oracle and Sybase – but not Informix?) understand this query as well. Firebird (another free software RDBMS with growing momentum) will support this as well in version 3.0, which is soon to be released.

    PS: Sorry if the formatting is off; there are no syntax instructions in the comment submission form.

    Peter Bex

    6 Oct 11 at 4:59 am

  146. Commenters,

    I am going to stop approving further comments that say “here is my data, give me the query that does what I need.” There are more than enough examples already that illustrate the principles my post teaches. Think, study, learn, apply, and become independently able to solve your own problems.

    Xaprb

    6 Oct 11 at 11:29 am

  147. I’ve been trying to alter the one pass technique so that it pulls the most expensive records, instead of the cheapest. How would I go about doing this?

    LK

    15 Oct 11 at 3:38 pm

  148. Just a quick note to say thanks – the instructions and examples are clear, informative and illuminating and allow part timers like me to do more than we could otherwise do. Thanks in abundance.

    David

    17 Oct 11 at 5:39 pm

  149. Thanks for this post, I was able to use the ideas to get the most recent and then everything that is not the most recent. Made life a lot easier, at least for this problem, onto the next.

    Sean

    15 Dec 11 at 3:58 pm

  150. Excellent! Thank you for this info.

    Jason Yousef

    6 Jan 12 at 2:18 pm

  151. Super helpful. Thanks for posting!

    Martha

    9 Jan 12 at 10:55 pm

  152. Thank you so much. Now my app looks serious :-))

    Ludovic

    24 Jan 12 at 2:49 pm

  153. Someone else pointed to a cursor solution. I found this considerable faster than the other solutions and far easier to understand. On my non too impressive box 12,000 rows are processed in 0.00.35.

    This gets up to the 10 highest graded locations – same problem as the fruit. Setting temp table engine to memory makes it slightly faster still.

    CREATE PROCEDURE getBestByCountry ()
    BEGIN

    DECLARE done INT DEFAULT FALSE;
    DECLARE _country VARCHAR(40);
    DECLARE cur1 CURSOR FOR SELECT DISTINCT country FROM Location;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN cur1;

    DROP TABLE IF EXISTS best;

    CREATE temporary TABLE best
    ( id INT UNSIGNED ) ;

    read_loop: LOOP
    FETCH cur1 INTO _country;
    IF done THEN
    LEAVE read_loop;
    END IF;

    INSERT INTO best (id)
    SELECT id FROM Location WHERE country = _country
    ORDER BY grade DESC
    LIMIT 10;

    END LOOP;

    CLOSE cur1;

    SELECT b.* — get rest of data
    FROM best t,
    listBest b
    WHERE t.id = b.id
    ORDER BY b.country, b.grade DESC, b.overallRating DESC
    ;

    END;

    Stephen

    29 Jan 12 at 6:07 pm

Leave a Reply