Xaprb

Stay curious!

How to select the first/least/max row per group in SQL

with 75 comments

Here are some common SQL problems, all of which have related solutions: how do I find the most recent log entry for each program? How do I find the most popular item from each category? How do I find the top score for each player? In general, these types of “select the extreme from each group” queries can be solved with the same techniques. I’ll explain how to do that in this article, including the harder problem of selecting the top N entries, not just the top 1.

This topic is related to numbering rows, which I just wrote about (see my articles about MySQL-specific and generic techniques to assign a number to each row in a group). Therefore I’ll use nearly the same table and data as I used in those articles, with the addition of a price column:

+--------+------------+-------+
| type   | variety    | price |
+--------+------------+-------+
| apple  | gala       |  2.79 | 
| apple  | fuji       |  0.24 | 
| apple  | limbertwig |  2.87 | 
| orange | valencia   |  3.59 | 
| orange | navel      |  9.36 | 
| pear   | bradford   |  6.05 | 
| pear   | bartlett   |  2.14 | 
| cherry | bing       |  2.55 | 
| cherry | chelan     |  6.33 | 
+--------+------------+-------+

Selecting the one maximum row from each group

Let’s say I want to select the most recent log entry for each program, or the most recent changes in an audit table, or something of the sort. This question comes up over and over on IRC channels and mailing lists. I’ll re-phrase the question in terms of fruits. I want to select the cheapest fruit from each type. Here’s the desired result:

+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
+--------+----------+-------+

There are a few common solutions to this problem. All involve two steps: finding the desired value of price, and then selecting the rest of the row based on that.

One common solution is a so-called self-join. Step one is to group the fruits by type (apple, cherry etc) and choose the minimum price:

select type, min(price) as minprice
from fruits
group by type;
+--------+----------+
| type   | minprice |
+--------+----------+
| apple  |     0.24 | 
| cherry |     2.55 | 
| orange |     3.59 | 
| pear   |     2.14 | 
+--------+----------+

Step two is to select the rest of the row by joining these results back to the same table. Since the first query is grouped, it needs to be put into a subquery so it can be joined against the non-grouped table:

select f.type, f.variety, f.price
from (
   select type, min(price) as minprice
   from fruits group by type
) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| cherry | bing     |  2.55 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
+--------+----------+-------+

Another common way to do this is with a correlated subquery. This can be much less efficient, depending on how good your system’s query optimizer is. You might find it clearer, though.

select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type);
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
+--------+----------+-------+

Both queries are logically equivalent, though they may not perform the same.

Select the top N rows from each group

This is a slightly harder problem to solve. Finding a single row from each group is easy with SQL’s aggregate functions (MIN(), MAX(), and so on). Finding the first several from each group is not possible with that method because aggregate functions only return a single value. Still, it’s possible to do.

Let’s say I want to select the two cheapest fruits from each type. Here’s a first try:

select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
   or price = (select min(price) from fruits as f where f.type = fruits.type
      and price > (select min(price) from fruits as f2 where f2.type = fruits.type));
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | gala     |  2.79 | 
| apple  | fuji     |  0.24 | 
| orange | valencia |  3.59 | 
| orange | navel    |  9.36 | 
| pear   | bradford |  6.05 | 
| pear   | bartlett |  2.14 | 
| cherry | bing     |  2.55 | 
| cherry | chelan   |  6.33 | 
+--------+----------+-------+

Yuck! That can be written as a self-join, but it’s just as bad (I leave it as an exercise for the reader). This gets worse as you go to higher numbers (top 3, top 4…). There are other ways to phrase the statement, but they all boil down to the same thing, and they’re all pretty unwieldy and inefficient.

There’s a better way: select the variety from each type where the variety is no more than the second-cheapest of that type.

select type, variety, price
from fruits
where (
   select count(*) from fruits as f
   where f.type = fruits.type and f.price < fruits.price
) <= 2;

This is elegant, and lets you vary N without rewriting your query (a very good thing!), but it’s functionally the same as the previous query. Both are essentially a quadratic algorithm relative to the number of varieties in each type. And again, some query optimizers may not do well with this and make it quadratic with respect to the number of rows in the table overall (especially if no useful index is defined), and the server might get clobbered. Are there better ways? Can it be done with one pass through the data, instead of the many passes required by a correlated subquery? You know it can, or I wouldn’t be writing this, now would I?

Use UNION

If there’s an index on (type, price), and there are many more records to eliminate than to include in each group, a more efficient single-pass method (especially for MySQL, but also for some other RDBMSs) is to break the queries out separately and put a limit on each, then UNION them all back together. Here’s the syntax you need for MySQL:

(select * from fruits where type = 'apple' order by price limit 2)
union all
(select * from fruits where type = 'orange' order by price limit 2)
union all
(select * from fruits where type = 'pear' order by price limit 2)
union all
(select * from fruits where type = 'cherry' order by price limit 2)

Peter Zaitsev has written in detail about this technique, so I won’t go into it too much more here. If it suits your purposes, it can be a very good solution.

One note: use UNION ALL, not just UNION. It prevents the server sorting the results to eliminate duplicates before returning them. In this case there will be no duplicates, so I’m telling the server to skip that (useless, expensive) step.

Do it with user variables on MySQL

The UNION trick is an especially good idea when the results are a small fraction of the rows in the table and there is an index that can be used for sorting the rows. Another linear-time technique, which might be a good option in cases where you are selecting most of the rows from the table anyway, is user variables. This is MySQL-specific. Please refer to my previous post on how to number rows in MySQL for the gory details of why this works:

set @num := 0, @type := '';

select type, variety, price
from (
   select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
  from fruits
  order by type, price
) as x where x.row_number <= 2;

This isn’t one pass through the table, by the way. The subquery is implemented as a temporary table behind the scenes, so filling it with data is one pass; then selecting every row from it and applying the WHERE clause is another. However, twice through is still O(n) with respect to the table size. That’s a lot better than correlated subqueries, which are O(n2) with respect to the group size — even moderate group sizes cause bad performance (say there are five varieties of each fruit. That’s on the order of 25 passes through the table, all told).

One-pass technique on MySQL… maybe?

If you want to leave your queries up the the query optimizer’s whims, you can try this technique, which builds no temporary tables and makes just one pass through:

set @num := 0, @type := '';

select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
from fruits
group by type, price, variety
having row_number <= 2;

This theoretically ought to work if MySQL orders by the GROUP BY criteria, which it sometimes does for efficiency and to produce the expected results. Does it work? Here’s what it returns on MySQL 5.0.27 on Windows:

+--------+----------+-------+------------+--------+
| type   | variety  | price | row_number | dummy  |
+--------+----------+-------+------------+--------+
| apple  | gala     |  2.79 |          1 | apple  |
| apple  | fuji     |  0.24 |          3 | apple  |
| orange | valencia |  3.59 |          1 | orange |
| orange | navel    |  9.36 |          3 | orange |
| pear   | bradford |  6.05 |          1 | pear   |
| pear   | bartlett |  2.14 |          3 | pear   |
| cherry | bing     |  2.55 |          1 | cherry |
| cherry | chelan   |  6.33 |          3 | cherry |
+--------+----------+-------+------------+--------+

Look closely… it’s returning rows one and three from each group, and they’re not numbered in order of increasing price? Huh? But the HAVING clause says the row_number should be no greater than 2! Here’s what it returns on version 5.0.24a on Ubuntu:

+--------+------------+-------+------------+--------+
| type   | variety    | price | row_number | dummy  |
+--------+------------+-------+------------+--------+
| apple  | fuji       |  0.24 |          1 | apple  |
| apple  | gala       |  2.79 |          1 | apple  |
| apple  | limbertwig |  2.87 |          1 | apple  |
| cherry | bing       |  2.55 |          1 | cherry |
| cherry | chelan     |  6.33 |          1 | cherry |
| orange | valencia   |  3.59 |          1 | orange |
| orange | navel      |  9.36 |          1 | orange |
| pear   | bartlett   |  2.14 |          1 | pear   |
| pear   | bradford   |  6.05 |          1 | pear   |
+--------+------------+-------+------------+--------+

Look, this time everything is numbered 1 and every row is returned. Wonky. This is exactly what the MySQL manual page on user variables warns about.

This technique is pretty much non-deterministic, because it relies on things that you and I don’t get to control directly, such as which indexes MySQL decides to use for grouping. However, if you need to use it — and I know there are some folks out there who do, because I’ve consulted for them — you can still tweak it. We’re getting into the realm of really bastardizing SQL, but the results above came from a table without indexes other than the primary key on (type, variety). What happens if I add an index MySQL can use for grouping?

alter table fruits add key(type, price);

Nothing changes, and EXPLAIN shows why: the query doesn’t use the index I just added. Why? Because the grouping is on three columns, and the index is only on two. In fact, the query is using a temp table and filesort anyway, so this is still not achieving the once-through goal. I can force it to use the index:

set @num := 0, @type := '';

select type, variety, price,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
from fruits force index(type)
group by type, price, variety
having row_number <= 2;

Let’s see if that works:

+--------+----------+-------+------------+--------+
| type   | variety  | price | row_number | dummy  |
+--------+----------+-------+------------+--------+
| apple  | fuji     |  0.24 |          1 | apple  | 
| apple  | gala     |  2.79 |          2 | apple  | 
| cherry | bing     |  2.55 |          1 | cherry | 
| cherry | chelan   |  6.33 |          2 | cherry | 
| orange | valencia |  3.59 |          1 | orange | 
| orange | navel    |  9.36 |          2 | orange | 
| pear   | bartlett |  2.14 |          1 | pear   | 
| pear   | bradford |  6.05 |          2 | pear   | 
+--------+----------+-------+------------+--------+

Ah, now we’re cooking! It did what I wanted, without a filesort or temporary table. Another way to do this, by the way, is to take variety out of the GROUP BY so it uses the index on its own. Because this selects a non-grouped column from a grouped query, this only works if you are running with ONLY_FULL_GROUP_BY mode turned off, which I hope you are not doing without good reason.

Other methods

Be sure to check the comments for user-contributed methods. There are some really novel approaches. I always learn so much from your comments… thank you!

Conclusion

Well, that’s it. I’ve shown you several ways of solving the common “get the extreme row from each group” query, and then moved on to how you can get the top N rows from each group in various ways. Then I dove into MySQL-specific techniques which some (including myself, depending on my mood) would regard as mildly foolish to utterly stupid. But if you need the last bit of speed out of your server, you sometimes have to know when to break the rules. And for those who think this is just MySQL foolishness, it’s not; I’ve seen people desperately do these types of things on other platforms too, such as SQL Server. There are hacks and tweaks on every platform, and people who need to use them.

I hope you’ve enjoyed and profited from this discussion. If you did, maybe you’d like to subscribe for convenient notification of future articles. Happy coding!

Written by Xaprb

December 7th, 2006 at 8:59 am

Posted in SQL

Tagged with

75 Responses to 'How to select the first/least/max row per group in SQL'

Subscribe to comments with RSS or TrackBack to 'How to select the first/least/max row per group in SQL'.

  1. [...] How to select the first/least/max row per group in SQL [...]

  2. I believe you can also solve the “cheapest 2 from each group” problem with a LEFT JOIN:

    SELECT f.type, f.variety, f.price
    FROM fruits f
    LEFT JOIN fruits f2 ON (f.variety = f2.variety
      AND f.price <= f2.price)
    GROUP BY f.type, f.variety, f.price
    HAVING COUNT(*) <= 2
    

    This is untested SQL, but hopefully you get the idea.

    Perrin Harkins

    7 Dec 06 at 4:57 pm

  3. The concept is right, but you got your criterion slightly mixed.

    I knew about this one but somehow didn’t think to include it. Thanks for the reminder. It is equivalent to the correlated subquery method, but without subqueries of course. (it is still O(n^2) though).

    Xaprb

    7 Dec 06 at 5:06 pm

  4. Excellent! This taught me how to solve a problem I was working on.

    Thanks!

    Marc

    29 Dec 06 at 6:24 pm

  5. what about:

    select type,substr(min(concat(lpad(cast((price*100) as char),10,'0'),variety)),11),min(price) from fruits;

    one pass, zero subqueries, the only drawback is that it can’t be used with float type. It can also use a index for the min if you materialize the expression in the min with a trigger and index it.

    Cristiano

    16 Jan 07 at 11:52 am

  6. Cristiano, that’s a novel approach, and a great tip, though you forgot the GROUP BY type that’s necessary to prevent non-deterministic results. For a sparsely grouped table (e.g. thousands of rows per type), adding an extra column and indexing it would probably have diminishing returns, but I can see some situations where it would be a very good idea.

    Thanks for writing in; this is an approach I’d never thought about. Now I have another possible way to solve problems!

    Xaprb

    16 Jan 07 at 2:40 pm

  7. This was unbelievably helpful

    Isaac

    26 Jan 07 at 10:46 pm

  8. Great job. Thanks a lot for your article, very helpful.

    Enrique

    14 Feb 07 at 3:30 am

  9. Thanks for this superb article.

    Raj

    1 Mar 07 at 2:58 am

  10. RE: Select the top N rows from each group

    To get better performance for table with large number of rows, using identity() function and temporary table (if your database has this two feature). The following scripts have been tested in SQL Server 2000.

    create table fruits
    (
      type varchar(10),
      variety varchar(10),
      price decimal(18,2)
    )
    
    insert into fruits values( 'apple'  , 'gala'       ,  2.79)
    insert into fruits values( 'apple'  , 'fuji'       ,  0.24)
    insert into fruits values( 'apple'  , 'limbertwig' ,  2.87)
    insert into fruits values( 'orange' , 'valencia'   ,  3.59)
    insert into fruits values( 'orange' , 'navel'      ,  9.36)
    insert into fruits values( 'pear'   , 'bradford'   ,  6.05)
    insert into fruits values( 'pear'   , 'bartlett'   ,  2.14)
    insert into fruits values( 'cherry' , 'bing'       ,  2.55)
    insert into fruits values( 'cherry' , 'chelan'     ,  6.33)
    
    select type, variety, price, identity(int, 1, 1) as price_rank
    into #fruits
    from fruits
    order by type, price
    
    select f1.type, f1.variety, f1.price
    from #fruits f1,
    (
    	select type, min(price_rank) as min_price_rank
    	from #fruits
    	group by type
    ) f2
    where f1.type = f2.type
      and f1.price_rank < f2.min_price_rank + 2

    Haitao

    12 Mar 07 at 9:18 pm

  11. Thanks Haitao. For those who don’t know, prefixing a table name with # in SQL Server makes it a temporary table.

    Xaprb

    13 Mar 07 at 7:53 am

  12. RE: Median calculation made easy by reusing the ‘Select the top N rows from each group’ query

    Median calculation is one of the nightmares for some database developers (including myself), and it is disappointing that SQL Server 2005 still does not have the function available, still not. The calculation I googled so far are all correlation queries which are far too complex and not practical for tables with millions of records, because of the poor performance. But I just realize that the code I posted above just one step away from the median function, haha gotya…

    create table fruits (type varchar(10), variety varchar(10), price decimal(18,2))
    
    insert into fruits values( 'apple'  , 'gala'       ,  2.79)
    insert into fruits values( 'apple'  , 'fuji'       ,  0.24)
    insert into fruits values( 'apple'  , 'limbertwig' ,  2.87)
    insert into fruits values( 'orange' , 'valencia'   ,  3.59)
    insert into fruits values( 'orange' , 'navel'      ,  9.36)
    insert into fruits values( 'pear'   , 'bradford'   ,  6.05)
    insert into fruits values( 'pear'   , 'bartlett'   ,  2.14)
    insert into fruits values( 'cherry' , 'bing'       ,  2.55)
    insert into fruits values( 'cherry' , 'chelan'     ,  6.33)
    
    select type, variety, price, identity(int, 1, 1) as price_rank
    into #fruits
    from fruits
    order by type, price
    
    select f1.type, avg(f1.price) as median_price
    from #fruits f1,
    (
      select  type,
                min(price_rank) as min_price_rank,
                max(price_rank) as max_price_rank
    from #fruits
    group by type
    ) f2
    where f1.type = f2.type
      and (	   f1.price_rank = (max_price_rank + min_price_rank)     / 2
            or f1.price_rank = (max_price_rank + min_price_rank + 1) / 2)
    group by f1.type

    Basically it is to average the middle two(if even) or one(odd) value in a group.

    Being tested under SQL Server 2000 for a table with 2 million records, it took less than a minute. The part took most of the time is to generate the temporary table, so make sure you only put the necessary fields into the #table.

    Hope this helps.

    Also, cheers to Xaprb for the well-maintained blog.

    Haitao

    13 Mar 07 at 7:30 pm

  13. Hello,

    I have a simple SQL question.

    Here is the output I obtain when I issue the following command :

    SELECT code_age_01, nuitees, COUNT(*) FROM stats_2007_03_15__00_03_29 GROUP BY code_age_01, nuitees;
    
    +-------------+---------+----------+
    | code_age_01 | nuitees | COUNT(*) |
    +-------------+---------+----------+
    | A           |       1 |        3 |
    | A           |       2 |       11 |
    | A           |       3 |        3 |
    | A           |       5 |        5 |
    | A           |       8 |        1 |
    | B           |       1 |        5 |
    ... snipped ...
    | H           |       4 |        1 |
    | H           |       8 |        1 |
    | I           |       0 |        1 |
    | I           |       2 |        2 |
    | I           |       4 |        1 |
    | I           |       6 |        1 |
    | I           |       8 |        1 |
    +-------------+---------+----------+

    The question is as follows :

    Is there a way to reformat the SQL command in such a way that the resulting table would be what I call “a two-way table,” that is :

    - the top row would list all the “nuitees” values,
    - the first column would list all the “code_age_01″ values
    - all the other cells of the table would list the “count(*)” value for the “nuitees” value of the column and the “code_age_01″ value of the row.

    In other words, how can I obtain the following output :

       0  1  2  3  4  5  6  7  8  9
    ==================
    A
    B
    C
    D
    E
    F               values
    G                 for
    H               count(*)
    I

    Thank you,

    Pierre Caron

    Pierre Caron

    15 Mar 07 at 12:53 am

  14. Bonjour Pierre,

    What you need is called a “cross tabulation” or “pivot table.” Giuseppe Maxia wrote a great article about cross tabulations some time ago.

    Xaprb

    15 Mar 07 at 9:04 am

  15. On the topic “Selecting the one maximum row from each group”, how would this look if you got data from two tables, e.g. you split this table up with a column productid to link them together. In one table you have productid, type and variety. In the second table you have productid and price. Is it still possible to get it into one SQL statement?

    RN

    19 Mar 07 at 12:33 pm

  16. Certainly. You can do a subquery in the FROM clause, joining the two tables together in the suquery. Now they appear as one table to the rest of the query (hence the reason this is sometimes called a “derived table,” which isn’t technically incorrect, but it’s a misuse of the term).

    That’s just a one-size-fits-all answer. In any given query there are likely to be many other ways to do it.

    Xaprb

    19 Mar 07 at 1:23 pm

  17. I have a similar problem I am trying to solve, but in my case I need to get only the top 1 per group, but within the group I need the greatest date and ID.

    I was using the following

    select
    	a.CampaignID,
    	a.Email,
    	a.TS
    into #tempLog
    from dbo.xLog a
    inner join (select email,max(TS) as TS from xLog where campaignid = @campaignid group by email) b
    on a.email=b.email and
       a.TS = b.TS
    where campaignid = @campaignid

    This did not work because it turned out I had duplicate records with the same date (TS) per email.

    The xLog table also has an ID (identity) field, but I need to firstly grab the latest timestamp before getting the greatest ID.

    I’m believe this will give me the correct results, although I don’t know how efficient it is.

    select
    	a.CampaignID,
    	a.Email,
    	a.TS
    into #tempLog
    from dbo.xLog a
    inner join (select email,max(cast(TS as varchar(24)) + cast(ID as nvarchar(10)) ) as X from xLog where campaignid = @campaignid group by email) b
    on a.email=b.email and
       cast(a.TS as varchar(24)) + cast(a.ID as nvarchar(10)) = b.X
    where campaignid = @campaignid

    Talahaski

    21 Mar 07 at 12:33 pm

  18. I found the “Selecting the one maximum row from each group” very helpful as well. It helped me write a query that I was finding difficult – joining a specific record from a grouped subquery back to the original table that I was grouping on in the subquery. I could not have done it without the clear and useful examples posted here.

    Byron V

    24 Mar 07 at 4:00 pm

  19. Genius and a life saver

    Carl

    30 Mar 07 at 7:47 pm

  20. This helped me do my task in an elegant way! Thanks for posting this! :)

    trebskie

    30 Mar 07 at 9:48 pm

  21. I was googling for a half hour and couldn’t find what I needed until I got here. Very clear explanation and exactly the code that I needed. Thanks very much!

    David

    3 Apr 07 at 5:49 pm

  22. Hello
    great article :)
    it would be interesting to edit the article and add Cristiano technique :-)

    matt

    16 Apr 07 at 11:47 pm

  23. Those are clear examples when using aggregate functions. I need to get the first record from a group, and the field is a char field. I am using advantage sql and do not have a first() func.

    Specifically, I am linking two tables by a common field, Jobno. I need all
    records from table a and only the first from b with the same jobno.

    table a
    jobno,quantity
    111,22
    112,23
    
    table b
    jobno,billno
    111,aaa
    111,aaa
    111.bbb
    112,ccc
    
    The result I need is
    jobno,billno
    111,aaa
    112,ccc

    Thanks – Ken

    Ken

    18 Apr 07 at 2:25 pm

  24. Ken, I don’t know anything about SQL Advantage, but you should be able to apply the techniques I mention in this article. Don’t make the common mistake of joining and then trying to extract. That is aggregation at the wrong place. Instead, aggregate and get the first record from table b. Now join that result to table a.

    I have never heard of a first() function.

    Xaprb

    18 Apr 07 at 2:53 pm

  25. Edit: Sorry, I didn’t include formatting the first time I posted.

    I’ve been searching high and low across the Internet kingdom for a solution to a problem I have. Your post has probably given the most insight so far but I’m still not there yet.

    I have two tables – category references and posts and I would like to obtain the latest 10 posts for each of the categories with all results ordered in descending date order.

    Category Reference Table

    Stores cat_id to reference a category in a separate table and post_id referencing the post from the table below. This is a referential table so that multiple categories can be assigned to each post.

    +--------+------------+
    | cat_id  | post_id     |
    +--------+------------+
    

    Posts Table

    The important fields are id and date. The date field needs to be used for all ordering.

    +--------+------------+
    | id        | date         |
    +--------+------------+

    I don’t want to have to resort to using separate queries so if the solutions provided could some how be applied to this, that would be excellent.

    Thanks!

    Rez

    20 Apr 07 at 8:31 pm

  26. [...] How to select the first/least/max row per group in SQL – Xaprb (tags: sql programming howto db tips aggregate) [...]

  27. Simply great page.
    It has taken me years to learn the hard way what’s in this page, and I still learned a lot by reading it.
    Thanks.

    Warning, the different methods give the same results because each price is DIFFERENT in each group.
    You have to be very carefull with queries like that if it’s not the case.

    If you want EXACTLY one line, Cristiano’s method is the one you want. It has also the advantage to be deterministic : at equal price, you get the first variety alphabetically. Sometimes what you get is the last alphabetically (if you had searched for the max instead).
    If that is a problem, you can play with binary invertion of part of the string to get the correct order back. If your locale makes the correct order inconsistent with character codes, you can use a fixed lengh column that contains a proper “reverse order” string corresponding to variety and your locale.

    The other methods will give all the lines with the best price. If that’s what you want, choose one of them.

    The top N of each group can be even more confusing :

    The first two examples given by Xaprb are not equivalent :

    the first will give all the fruits per type that is in the top N prices of the type.

    the second will give all the fruits per type that have less than
    N cheaper fruits. In other words, the N cheapest of each type evry fruits of the same price and category of one of these N.

    In both cases, you may get more than N rows per category.

    The UNION solution is the one that gives N maximum lines per category. Adding fruit to the ORDER BY clause allows to get deterministic results, but unfortunatly sometimes at a high performance cost.

    The same can be achieved on SQL server, using
    SELECT TOP N … instead of SELECT … LIMIT N .
    The same “ORDER BY” warnings apply.

    Some implementations allow to use Select TOP N or SELECT … LIMIT N
    in a subquery that refers to colums in the main query.
    When allowed you can have the top N for each group this way.
    When disallowed, error messages are generally cryptic, or sometimes results are false.
    When acceptable, performace wise, the second method, with a prorer order by and a removing of the extra unwanted lines per type at programming level is the best in my experience.

    Matthieu Haller

    17 Jul 07 at 11:51 am

  28. Fantastic, so simple! As I needed exactly one row I used Cristiano’s method (thanks for the tip Matthieu) and converted it to SQL server 2000 syntax, and in case this helps anyone else:

    select type,
    substring((
    min(
    right(replicate(’0′,10) convert(varchar(10),price*100),10) convert(varchar(10),variety)
    ))
    ,10,10 –add maximum possible length for variety here
    )

    ,min(price)
    from fruit
    group by type

    I tested this on my own data structure, then replaced the column and table name for those in the example.

    Michael Caldow

    2 Aug 07 at 12:49 pm

  29. If you dont expect too many values of a single type then it is possible to create a list of values of single type with a GROUP_CONCAT(cID) GROUP BY cType
    1,2,3,…
    4,5,6,…
    7,8,9,…

    select only first N values SUBSTRING_INDEX(cList, ‘,’, 2)
    1,2
    4,5
    7,8

    join with the table again
    1 – 1,2
    1 – 4,5
    1 – 7,8
    2 – 1,2
    ….

    select values that are in list – FIND_IN_SET(CID, CList)
    1 – 1,2
    2 – 1,2
    4 – 4,5
    5 – 4,5
    7 – 7,8
    8 – 7,8

    SELECT cID, … FROM Table
    LEFT JOIN (
    SELECT SUBSTRING_INDEX(cList,’,',10) AS cList FROM (
    SELECT GROUP_CONCAT(cID) AS cList FROM Table GROUP BY cType
    ) AS T1
    ) AS T2
    ON FIND_IN_SET(cID, cList)>0
    WHERE cList IS NOT NULL

    It might looks like a complicated but it is not, and it is MUCH faster than processing data outside SQL, keep on your mind that GROUP_CONCAT can result only

    Artur Filipiak

    7 Aug 07 at 5:58 am

  30. Hi! I have the following problem what i have this query i made from the article of finding duplicate rows but the downside of it is that it won’t put all the duplicates but only one because of the group by, i suppose… what i want is to show all IPs alike and their accounts

    Could you help me in this?

    Here is the query i’m using:
    select username, last_ip from account group by last_ip HAVING count(*) >1

    ——– ———–
    |username|last_ip |
    ——– ———–
    |abc |212.10.10.1|
    |cde |212.10.10.1|
    |ghi |198.12.50.4|
    |ijk |198.12.20.2|
    ——– ———–

    Syanneth

    17 Dec 07 at 10:29 pm

  31. if use GROUP statement,it will be a big problem to ORDER BY some fields which are not unique.

    angel

    4 Jan 08 at 6:41 am

  32. thank you very much.

    martin

    15 Jan 08 at 6:07 am

  33. Thanks! you helped me solve a problem. On Oracle databases only, a good way of limiting the number of rows returned is to use the system var ‘rownum’:
    eg: select *
    from ( select ixa.alert_id , ixa.alert_title, to_char(ixa.alert_date,’mm/dd/yyyy’)
    from iss_xf_alerts ixa
    where ixa.oid = ‘5443′)
    order by ixa.alert_date desc) where rownum

    mike

    18 Feb 08 at 12:15 pm

  34. Great – very helpful!

    I’d be interested in the performance implications of these various methods – it’s easy testing on a simple db with a few rows, but how does (e.g. from your ‘Selecting the one maximum row from each group’):

    select f.type, f.variety, f.price
    from (
    select type, min(price) as minprice
    from fruits group by type
    ) as x inner join fruits as f on f.type = x.type and f.price = x.minprice;

    Work with a few million rows?

    Cheers,
    -Dan

    Dan

    24 Mar 08 at 2:57 am

  35. Genius ! this article just solve my half year headache :p

    Oh and dan I just use this methods on hundred thousands of data, and no performance issues so far

    WAWAN

    25 Mar 08 at 3:49 am

  36. Hello Xaprb.
    I’m novice in SQL statements.
    I have one question if is possible
    There is anyone how know how to make a query (using SQL selection) in access from one table.
    (The table is made in MS Access 2003).
    My table is like this:
    - Name of table: database
    - Columns: User_Name, Telephone, Time_call, Call_Duration_minutes.
    I want to see the first 3 longest values for Call_Duration_minutes
    Grouped by User_Name
    I want the results to be something like:
    User_Name Telephone Time_Call Call_Duration_minutes
    Aida 434443343 12:23 15
    Aida 34342234 14:32 14
    Aida 75656544 9:34 11
    Cobal 44554554 19:34 57
    Cobal 56544535 23:34 45
    Cobal 89767554 01:23 5
    Bob 34555432 03:45 125
    Bob 78678676 12:23 21
    Bob 67565444 17:54 11
    - – - – – - – -ETC ETC ETC ETC – - – – - -
    I made some queries but this one is to complicate for me,
    Thank you very much!
    Cheers,
    CRizz

    CRIzz

    8 Apr 08 at 4:41 pm

  37. Great Article!!!

    I just want to point a couple of SQL extensions that I think it could be very helpfull and powerfull and not very difficult to implement.

    Imagine a function called:
    LPOSITION_IN_INDEX( , , ,…)

    This function would return the position that a key would be inserted the key in a index. If the key alredy exists (not unique index) it would return the position just before (Left position).

    Suposing tht the table has the indes:

    ALTER TABLE fruits ADD INDEX XTypePrice(type,price);

    the select would be:

    SELECT type,variety,price
    FROM fruits
    WHERE LPOSITION_IN_INDEX( XTypePrice , type, price)-LPOSITION_IN_INDEX(XTypePrice, type,0)

    Jordi Baylina

    16 Apr 08 at 2:12 pm

  38. Great Article!!!

    I just want to point a couple of SQL extensions that I think it could be very helpfull and powerfull and not very difficult to implement.

    Imagine a function called:
    LPOSITION_IN_INDEX(&ltINDEX_NAME&gt ,&ltKEY_1stPART&gt ,[KEY_2ndPART] ,…)

    This function would return the position that a key would be inserted the key in a index. If the key alredy exists (not unique index) it would return the position just before (Left position).

    Suposing tht the table has the indes:

    ALTER TABLE fruits ADD INDEX XTypePrice(type,price);

    the select would be:

    SELECT type,variety,price
    FROM fruits
    WHERE LPOSITION_IN_INDEX( XTypePrice , type, price)-LPOSITION_IN_INDEX(XTypePrice, type,0) &lt2

    Jordi Baylina

    16 Apr 08 at 6:05 pm

  39. Another extension that I beleve that would be good is to define a function similar to MIN or MAX but with an extra argument. Lets call them MIN2 and MAX2

    the select would be:

    SELECT type,MIN2(price,variety),MIN(price)
    FROM fruits
    GROUP BY type

    The idea is the same that CASANO’s one but more elegant.

    MIN2 and MAX2 could be defined in a mysql UDF very easyly!!

    Jordi Baylina

    16 Apr 08 at 6:06 pm

  40. very helpful
    thanks

    jalil

    24 Apr 08 at 1:53 pm

  41. Isn’t the second method:

    select type, variety, price
    from fruits
    where price = (select min(price) from fruits as f where f.type = fruits.type

    just wrong?

    It more or less assumes that price will be unique. If the same price was repeated for 2 types of fruits and it happend to be the minumum price for one of them, the other fruit would also be returned.
    Am I right?

    Michal Talaga

    3 Jun 08 at 6:12 am

  42. hi, refer to the second-cheapest of that type, the corelated way, i don’t quite understand the count(*) for,
    and another question is, what if i’m seeking for random 2 rows instead of the cheapest 2, i tried removing the price comparison, it doesn’t work. anyone care to enlighten me? thanks

    sswong

    8 Jun 08 at 12:57 pm

  43. Thanks. You are great.
    I always come back to this page :)

    Abdul

    20 Jun 08 at 2:23 am

  44. try this out!

    create a temporary table (by using the #)

    create a cursor to select only the id of each group.

    loop until end of cursor and select top 1 record for each group
    and insert the result to the temporary.

    select all records of temporary and return the data

    that worked for me! following is the store procedure i used…

    CREATE PROCEDURE dbo.sp_GetExamsLastStatus (@FromDate datetime,
    @ToDate datetime)
    /*
    (
    @parameter1 int = 5,
    @parameter2 datatype OUTPUT
    )
    */
    AS
    – Declare the variables to store the values returned by FETCH.
    DECLARE @ExamID int, @StatusDateTime datetime, @Status smallint;
    DECLARE @PrvExamID int;

    – Create temporary table
    CREATE TABLE #ExamStatus(ExamID int, StatusDateTime datetime,Status smallint );

    – create cursor
    DECLARE Exam_Cursor CURSOR FOR
    SELECT examstatus.ExamID, StatusDateTime, Status
    FROM ExamStatus inner join Exam on examstatus.examid=exam.examid
    where exam.examdate between @FromDate and @ToDate
    ORDER BY ExamID, StatusDateTime DESC;

    – open curson
    OPEN Exam_Cursor;

    FETCH NEXT FROM Exam_Cursor
    INTO @ExamID, @StatusDateTime, @Status;
    SET @PrvExamID=0;
    WHILE @@FETCH_STATUS = 0 –loop until end of file
    BEGIN
    if @ExamID@PrvExamID –if record changed
    begin
    SET @PrvExamID=@ExamID –store new ID

    –select the top 1 ordered by date record and
    –save it to the temporary file
    insert into #ExamStatus (ExamID, StatusDateTime, Status)
    SELECT TOP (1) ExamID, StatusDateTime, Status
    FROM ExamStatus
    WHERE (ExamID = @ExamID)
    ORDER BY StatusDateTime DESC;

    END

    FETCH NEXT FROM Exam_Cursor
    INTO @ExamID, @StatusDateTime, @Status;
    END;
    CLOSE Exam_Cursor;

    DEALLOCATE Exam_Cursor;

    –show results
    –select ExamID, StatusDateTime, Status from #ExamStatus ;
    SELECT #ExamStatus.ExamID, Exam.ExamDate, Exam.StartTime, Exam.ExamHours, #ExamStatus.StatusDateTime, #ExamStatus.Status,
    ExamClass.Description AS ClassDescription
    FROM #ExamStatus INNER JOIN
    Exam ON #ExamStatus.ExamID = Exam.ExamID LEFT OUTER JOIN
    ExamClass ON Exam.SiteCenterID = ExamClass.SiteCenterID AND Exam.ClassID = ExamClass.ClassID
    ORDER BY Exam.ExamDate,Exam.StartTime;

    RETURN

    Isidoros

    20 Jun 08 at 4:59 am

  45. This is an extraordinarily useful article, and I’ll share how I used it to pull a mailing list in hopes that perhaps someone else with a similar type of issue might find practical value for the excellent techniques described in the article:

    We rented data from a data broker. In this data, you have a few hundred thousand names, and these names are generally grouped together at buildings (a B-to-B type environment) and have job titles. There are thousands of buildings, meaning that it’s not quite as simple as the fruit example, where there are only about five of them. This list doesn’t get run all the time, so performance is not as much an issue for me as it may be for others – therefore, I used the “elegant” solution listed above (just above USE UNION), as it was the clearest and easiest to modify (even if it does have to make multiple passes with a correlated subquery).

    What I did was:
    -First, rank the job titles numerically to create a hierarchy (I did this on a separate table which gets added to the FROM list in the SELECT statement described in the next step with an INNER JOIN). It’s more important for administrators to get the marketing piece than their peons, so they get a higher ranking. However: one may have the situation where many of the peons have the same job title (and the same rank), meaning that it’s more difficult to pull the top 5 ranks as 1, 2, 3, 4, 5 because you may have one 1 and six or more 4s, for instance.

    -SELECT the entire list into a separate table (can be temporary) that has an int IDENTITY column, sorting this list first by building ID and then by the job title rank. The third sort criteria (unspecified) seems based on the ID of the record which is fairly random (so if there are two fours, the one with the lower random ID number gets selected first) – but this could be specified with even more precision if desired.

    -Then use the “elegant” query to select the top x# of records at each building based on the IDENTITY column, where the lowest identity number is always the highest-ranked job-title-holders, and on down the list until you hit the x# limit.

    This ensures that no more than x# of people per building will get the mailing piece, and that at the very least the top dog(s) will get it and a few others. Other criteria can obviously be added to the sorting if desired based on whatever criteria one would like – whether they’d responded to prior offers, for instance – when it goes into the sub-table to get an IDENTITY value.

    If I want to make sure that a certain group of buildings (say, ones with more than 20 names at the location) get y# (let’s say five) of the pieces but ones with less than 10 get a lower number (x#, say three), then it is easy to SELECT these into two separate IDENTITY-set tables and then select from both of those tables with a UNION clause to get a final list.

    Hope this practical example helps someone just like this article helped me.

    M A Roberts

    23 Jul 08 at 3:03 pm

  46. This article is very useful.. I want to thank you for it..
    I’ve question please …
    How can I select the top 1 news ordered by date and time in each category without using the subqueries ..I was trying to do it with “One-pass technique” you talked about in this article but it didn’t work because the “GROUP BY” statement works before the “ORDER BY” (which I use it to order the news by date and time ) .. but I couldn’t get it ..
    Thank you ..

    AL0NE

    26 Jul 08 at 7:03 pm

  47. The article and the responses have been very helpful. I am trying to solve a somewhat related problem. In terms of the example in this article, What is the best way to query
    1) Sum of min price of all types?
    2) Sum of 2nd highest price of all types?

    Thanks.

    Sunil

    7 Aug 08 at 3:53 pm

  48. Xaprb

    8 Aug 08 at 11:41 am

  49. I wrote a nice comment but didnt see the antispam task! lost everything.
    I was so happy now im not!:-(
    well, thanks for the post anyway, helped me a lot.;-)

    Joana

    15 Aug 08 at 10:54 pm

  50. Hello -
    I need to select a ‘prf’ name type from a names table for all employees but for those who do not have a prf name, I need to get their ‘pri’ name. This seems so close to what I am doing, just cannot make it work.
    thanks,

    Samantha

    12 Sep 08 at 4:17 pm

  51. very much helpfull…
    keep it up.
    thanks a lot

    nikul

    10 Oct 08 at 11:35 am

  52. Hello,

    Hope you can help me. I’m having problems using a derived table instead of the “fruits” table in your example.
    The derived table combines 3 tables:
    Campaign
    Contact Response
    Contact

    that gives me the fields:
    Campaignname
    Campaignid
    Responsedate
    Contactid
    Prefix
    Firstname
    Lastname
    Accountid

    I wish to get this data for the earliest Responsedate per Accountid per campaignname.

    What I’ve come up with so far is:

    SELECT details.CAMPAIGNNAME,
    details.CAMPAIGNID,
    details.RESPONSEDATE,
    details.PREFIX,
    details.FIRSTNAME,
    details.LASTNAME,
    details.CONTACTID,
    details.ACCOUNTID

    FROM

    (SELECT main_table.CAMPAIGNNAME,
    main_table.ACCOUNTID,
    min(main_table.RESPONSEDATE) AS Earliest_Response_Date

    FROM
    (SELECT sysdba.CAMPAIGN.CAMPAIGNNAME,
    sysdba.CONTACTRESPONSE.CAMPAIGNID,
    sysdba.CONTACTRESPONSE.RESPONSEDATE,
    sysdba.CONTACT.PREFIX,
    sysdba.CONTACT.FIRSTNAME,
    sysdba.CONTACT.LASTNAME,
    sysdba.CONTACTRESPONSE.CONTACTID,
    sysdba.CONTACT.ACCOUNTID

    FROM sysdba.CONTACT INNER JOIN
    sysdba.CONTACTRESPONSE ON sysdba.CONTACT.CONTACTID = sysdba.CONTACTRESPONSE.CONTACTID INNER JOIN
    sysdba.CAMPAIGN ON sysdba.CONTACTRESPONSE.CAMPAIGNID = sysdba.CAMPAIGN.CAMPAIGNID

    ) AS main_table

    GROUP BY main_table.CAMPAIGNNAME,
    main_table.ACCOUNTID

    ) AS selecta INNER JOIN main_table AS details on
    details.CAMPAIGNNAME = selecta.CAMPAIGNNAME and
    details.ACCOUNTID = selecta.ACCOUNTID and
    details.RESPONSEDATE = selecta.Earliest_Response_Date;

    ——————————————————
    When I try to run this, I get the error message:

    Msg 208, Level 16, State 1, Line 1
    Invalid object name ‘main_table’.

    Thanks in advance.

    Tim

    21 Oct 08 at 6:29 am

  53. Hello -
    I need to select a ‘prf’ name type from a names table for all employees but for those who do not have a prf name, I need to get their ‘pri’ name. This seems so close to what I am doing, just cannot make it work.
    thanks,

  54. Look at COALESCE.

    Xaprb

    22 Oct 08 at 5:47 pm

  55. Excellent! Thank you for this info.

    WeinerDawg

    11 Nov 08 at 3:42 pm

  56. And another 2 solutions:

    First get the rank of one entry:

    1. Oracle/Transact/SQL-Server Compatible with rank()

    SELECT TYPE,variety,Rank() over (PARTITION BY TYPE ORDER BY variety) FROM fruit

    2. Plain SQL-Compatible (here Oracle-Syntax)
    select a.type, a.variety,count(*) as rank
    from
    (
    select TYPE,variety,price
    from fruit
    group by TYPE,variety,price
    ) a
    inner join
    (
    select TYPE,variety,price
    from fruit
    group by TYPE,variety,price
    ) b
    on a.type = b.type and a.variety <= b.variety
    group by a.type, a.variety, a.price
    order by a.type, a.variety, count(*)

    Afterwards you can select on this rank

    graf

    27 Nov 08 at 8:30 am

  57. Thank you for sharing your knowledge. I’ve been trying to work out a way to get the max time row from GPS data for mulitple vehicles (hundreds of thousands of rows a day). Using ‘Selecting the one maximum row from each group’ I got a workable solution. Now I can port it over to my Arcmap and start getting some meaningful critic of our AVL responses.

    Here’s what I ended up with if anyone is interested. I’ll need to add variables yet, but the basics work.

    select
    t2.callsign
    ,t2.unitstatus
    ,t2.datetimestamp
    ,t2.AssignedIncident
    ,t2.Latitude
    ,t2.Longitude
    ,t2.Speed
    ,t2.Heading
    ,t2.SVCount
    ,t2.Source
    ,t2.AgeOfData
    From
    (select callsign, max(DateTimeStamp)as maxtime
    from dbo.HISTORY_200812
    where CallSign like ‘LALA%’and DateTimeStamp <= ‘2008-12-04 22:13:00′
    group by CallSign ) as t1
    inner join
    dbo.HISTORY_200812 as t2 on t2.callsign=t1.callsign and t2.datetimestamp=t1.maxtime
    order by
    t2.callsign

    Thom W

    12 Dec 08 at 7:45 pm

  58. Please Help
    I have the same problem but with two tables.
    Table 1. is called products
    Table 2 is called ProductVariations.

    Both have a field called ProductID (this is the primary key for the products table.

    The variations table has the price.
    I want a simple query that will Get me only one product variation per product. That being the lowest price.

    Please help.

    I dont know if i should cut an paste the SQL code to generate these tables????

    frank barbito

    20 Dec 08 at 12:07 pm

  59. what if i wanted the min and max amount in one row?

    juna

    12 Jan 09 at 4:53 am

  60. Thanks for this very clearly written and helpful post.

    Steve

    17 Jan 09 at 3:29 pm

  61. [...] How to select the first/least/max row per group in SQL at Xaprb [...]

  62. THANKS A LOT. THE ARTICLE WAS REALLY HELPFUL.

    AMITABH

    9 Mar 09 at 3:50 pm

  63. this was really helpful…………………….

    Nirmo

    17 Mar 09 at 5:25 am

  64. Thank you so much for this article.

    I can’t get the method with user variables to work correctly (I suspect there’s aliasing going on with the multiple JOINs and ORDER BYs I need to do), and I need to ORDER BY floating-point values so I can’t use Cristiano’s method. Instead, I derived a different method:

    SELECT type,
    SUBSTRING_INDEX(GROUP_CONCAT(variety ORDER BY price), ‘,’, 1) AS cheapest_variety,
    SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY price), ‘,’, 1) AS cheapest_price
    FROM fruits
    GROUP BY type;

    GROUP_CONCAT() concatenates all members of the group, in the specified order, while SUBSTRING_INDEX() pulls out the element of interest.

    I’m not sure if the above example is doing an extra ORDER BY (so as to extract the price of the cheapest variety), but in my case, I’m happy enough to get a unique identifier for the row (so I only need a single ORDER BY). I guess if you want values from the entire row returned without multiple ORDER BYs, you could CONCAT_WS() the fields with a different separator than is used by GROUP_CONCAT, and then parse the separate fields out later.

    Paul Price

    24 Mar 09 at 3:04 pm

  65. Great website, great article. VERY HELPFUL SOLUTIONS and theory!

    Michael B

    24 Mar 09 at 5:51 pm

  66. I wish to know how I can find A – B (The data in A but not in B) using JOIN.

    table_A

    name
    —-
    A
    B
    C

    table_B

    name
    —-
    B
    D
    E

    Thanks in advance

    ABRAHAM

    5 Apr 09 at 3:52 pm

  67. ABRAHAM,you can do it by left join.

    SELECT a.name FROM a LEFT JOIN b USING(name)
    WHERE b.name IS NULL

    angel

    6 Apr 09 at 3:04 am

  68. I want to know how the latest records updated for a column in a table can be retrieved from joining 3 tables together.
    eg :
    Table 1
    A1 pk1
    B1 pk2
    C1 pk3

    Table 2
    N1 date1 2_pk1
    N2 date2 2_pk2
    N3 date3 2_pk3
    N4 date1 2_pk4
    N5 date2 2_pk5
    Table 3
    pk1 2_pk1
    pk2 2_pk2
    pk3 2_pk3
    pk4 2_pk4

    Now in the above 3 tables I want fetch ‘date1′ of table 2 for each of A1 B1 and C1 in table1.I need to join the third table as wel.

    Please post your suggestions asap.Thanks in advance.

    Vish

    8 Apr 09 at 8:33 am

  69. We are creating a concerts web site. Each concert has various show times (1-to-many). When viewing a concert summary, we want to show From min date to max date, like May 29 – May 30 Still need to figure out how to hide the second month string if all show times are in the same month.

    Here’s a query for mysql that’s based on the simple self-join in this great tuturial. Sorry I don’t have time to simplify the field names to make it clearer.

    Hope it helps a passer-by like this tutorial helped me.

    select n.nid, n.title, x.minDate, y.maxDate from node n inner join
    (select node.nid, min(cte.field_datestamp_value) as minDate from node
    inner join (og_ancestry oga inner join content_type_event cte on oga.nid = cte.nid) on node.nid = oga.group_nid
    where (DATE_FORMAT(ADDTIME(FROM_UNIXTIME(cte.field_datestamp_value), SEC_TO_TIME(-25200)), ‘%Y-%m-%d’) >= ‘2009-06-27′)
    group by node.nid) as x on n.nid = x.nid
    inner join
    (select node.nid, max(cte.field_datestamp_value) as maxDate from node
    inner join (og_ancestry oga inner join content_type_event cte on oga.nid = cte.nid) on node.nid = oga.group_nid
    where (DATE_FORMAT(ADDTIME(FROM_UNIXTIME(cte.field_datestamp_value), SEC_TO_TIME(-25200)), ‘%Y-%m-%d’) >= ‘2009-06-27′)
    group by node.nid) as y on y.nid = n.nid

    Darko

    17 Apr 09 at 1:53 am

  70. Hello All,

    I have 2 tables.

    1) names

    id | name
    —-+———
    1 | A
    1 | BB
    1 | CCC
    2 | AAA-BBB
    5 | FIVE

    and b) criteria

    id | min | max
    —-+—–+—–
    1 | 1 | 2
    2 | 1 | 1

    I wish to get all rows from “names” which satisfy the min, max condition from “criteria” and rows with id’s resent in “names” but not in “criteria”

    id | name
    —-+———
    1 | A
    1 | B
    5 | FIVE

    I believe it should be a LEFT JOIN …not 100% sure.

    Thanks & Regards
    Roxana

    Roxana ABRAHAM

    4 May 09 at 6:00 pm

  71. Is that wat you need?

    select N.Id,N.Name
    from names N
    left outer join criteria C on N.Id=C.Id
    where (C.id is NULL) or
    ( (length(name)>=C.min) and
    (length(name)<=C.max)
    )

    Jordi Baylina

    5 May 09 at 3:04 am

  72. This is great, it helped me figure out how to write my query….although once I saw your second solution, I realized I had known all along, but just had brain fog. Nevertheless, I appreciate being able to find clever solutions to this problem!

    Put more information about fancy SQL out there, I love it!

    Maggi Tumulty

    9 Jun 09 at 9:04 am

  73. Very interesting. Thanks for putting this online.

    nir levy

    15 Jun 09 at 2:54 am

  74. Below are 3 fields and I want to group rows ,condition as follows.

    Show row for latest date (for field B )
    by latest date I mean 06/28/2002 >01/14/2001 , 06/28/2002 is latest date.
    If
    there are two latest date
    then
    take field C and find out latest date rows in C

    See below for more explanation.

    Table(TB ) has following fields( just giving fields in question):
    Code:

    A B C
    2 06/28/2002 01/14/2001
    2 11/21/1997 04/25/1995
    2 09/19/2006 12/27/2002 <— B has latest date ,so we show this row

    3 01/23/2001 11/22/1999
    3 02/19/2007 10/23/2002 <— B has latest date , so show this row
    3 17/06/2000 09/14/2002

    4 10/25/2002 06/22/2001
    4 10/25/2002 04/24/1980

    04/24/2001, so show this row.

    4 04/25/2002 05/12/1981
    4 03/13/1999 02/11/1985

    5 06/06/1998 05/05/1995

    The output should be :

    A B C
    2 09/19/2006 12/27/2002
    3 02/19/2007 10/23/2002
    4 10/25/2002 06/22/2001
    5 06/06/1998 05/05/1995

    The query for just finding latest of B is as followed
    Code:

    SELECT A, MAX(B),C
    FROM TB
    GROUP BY A
    WITH UR;

    I want to extend it to above requirement.

    Tokci

    17 Jun 09 at 4:46 am

  75. Somehow att the formatting is gone..
    Some more details which missed out is as followed

    a b c
    4 10/25/2002 06/22/2001
    4 10/25/2002 04/24/1980

    04/24/2001, so show this row.

    Tokci

    17 Jun 09 at 4:49 am

Leave a Reply