How to select the Nth greatest/least/first/last row in SQL

This is a continuation of my articles on how to select the desired rows from ranked data. A user recently posed a question in the comments that I thought was particularly intriguing:

What is the best way to query 1) Sum of min price of all types? 2) Sum of 2nd highest price of all types?

Sounds like fun! Let me start by saying the sum is the easy part. You can always do that like so:

select sum(price) from (
   -- find desired rows here
) as x;

Finding the desired rows is the hard part. In my previous articles I focused on extrema:

  • The single biggest/smallest/extremest row in each group. (Pretty easy.)
  • The N most extreme rows in each group. (Doable, but harder.)

In this article, we’re going to see how to get not the most extreme row, not the N most extreme rows, but — hold your breath — the single Nth most extreme row per group. (In a future article I might talk about how to get the Nth through Mth most extreme rows.)

The setup

Let’s create some sample data to get started.

drop table if exists fruits;

create table fruits (
   type varchar(20) not null,
   variety varchar(20) not null,
   price int not null,
   primary key (type, variety)
);

insert into fruits values
('apple',  'fuji',       1),
('apple',  'gala',       2),
('apple',  'limbertwig', 3),
('cherry', 'bing',       4),
('cherry', 'chelan',     5),
('orange', 'navel',      6),
('orange', 'valencia',   7),
('pear',   'bartlett',   8),
('pear',   'bradford',   9);

For convenience so it’s easier to see how they are ordered, I’ve just ordered the fruits alphabetically and given them unique prices.

The desired results — second-cheapest prices for each fruit — are as follows:

+--------+-----------------+
| type   | second_cheapest |
+--------+-----------------+
| apple  |               2 | 
| cherry |               5 | 
| orange |               7 | 
| pear   |               9 | 
+--------+-----------------+

The solution

The intuition you need here is that if you get the 2 cheapest fruits in each group, and then take the single most extreme from each group, you can get the Nth offset. Let’s begin with one of the queries from my earlier article. (You should be able to use any of them. I’m just using this one because it’s convenient and pretty clear.)

select type, variety, price
from fruits
where (
   select count(*) from fruits as f
   where f.type = fruits.type and f.price < fruits.price
) <= 1;
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |     1 | 
| apple  | gala     |     2 | 
| cherry | bing     |     4 | 
| cherry | chelan   |     5 | 
| orange | navel    |     6 | 
| orange | valencia |     7 | 
| pear   | bartlett |     8 | 
| pear   | bradford |     9 | 
+--------+----------+-------+

The result is the 2 cheapest fruits from each type. (Notice that all we really did was eliminate one row — the most expensive apple.) Now let’s get the second cheapest — and what is that? It’s simply the most expensive of the fruits we found in that query. And that’s just a MAX().

select type, max(price) as second_cheapest
from (
   select type, variety, price
   from fruits
   where (
      select count(*) from fruits as f
      where f.type = fruits.type and f.price < fruits.price
   ) <= 1
) as x
group by type;
+--------+-----------------+
| type   | second_cheapest |
+--------+-----------------+
| apple  |               2 | 
| cherry |               5 | 
| orange |               7 | 
| pear   |               9 | 
+--------+-----------------+

That’s it!

Sum of the second cheapest

By now you probably see the pattern: do it one step at a time, turning each thing into a simpler question that’s easy to answer. So how do we sum the second cheapest prices for each type of fruit? First, we find them (done!), then we sum them.

select sum(second_cheapest) from (
   select type, max(price) as second_cheapest
   from (
      select type, variety, price
      from fruits
      where (
         select count(*) from fruits as f
         where f.type = fruits.type and f.price < fruits.price
      ) <= 1
   ) as x
   group by type
) as y;
+----------------------+
| sum(second_cheapest) |
+----------------------+
|                   23 | 
+----------------------+

Conclusion

In this post I showed you how to decompose the problem into simpler and simpler pieces. Often what’s hardest about a complex query is trying to do it all at once. I have lots of tips elsewhere on this blog about how to make things faster — this is not a particularly fast query — but here I just wanted to show how to get the correct answer.

Technorati Tags:

You might also like:

  1. How to number rows in MySQL
  2. How to select the first/least/max row per group in SQL
  3. How to simulate the SQL ROW_NUMBER function
  4. Advanced MySQL user variable techniques
  5. How to select the first or last row per group in SQL

6 Responses to “How to select the Nth greatest/least/first/last row in SQL”


  1. 1 Eran Galperin

    I would like to know how to control joined rows per join condition. A common requirement is to pull rows from a table and to fetch some corresponding rows in a child table - but only 1 row per join condition, typically the most recent.

    For example a forum application would like to list a list of threads with some data on the last post in the thread. How can that be accomplished in a normalized setup?

  2. 2 Xaprb

    Approach it the other direction.

    First, get the data on the last post in each thread. See my earlier posts for this.

    Then join to the threads.

  3. 3 Gints

    As soon as MySQL will add analytic functions, it will be muuuuch easier :)

  4. 4 Jordi Baylina

    A simplification:

    select sum(price)
    from fruits f1
    where
    ( select count(*)
    from fruits f2
    where f1.type=f2.type and
    f2.price <= f1.price
    ) = 2

    If can exists multiple varieties with the same price then:

    select sum(price) from fruits f1
    where (
    select count(*)
    from fruits f2
    where f1.type=f2.type and
    ((f2.price < f1.price) or ( (f2.price=f1.price) and
    (f2.variety < f1.variety)))

    ) =1

  5. 5 Rijk van Wel

    Nice article, but isn’t the 2nd cheapest the _least_ expensive of the fruits found in the first query (so “min(price) as second_cheapest”)?

  6. 6 Xaprb

    Nope, because the first query finds the two cheapest prices for each fruit, so that would effectively be “find the min of the min” which is… the min, or the 1st cheapest.

    Try it and see!

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the appropriate forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead.