How to number rows in MySQL

I wrote before about a generic, cross-platform way to simulate the SQL ROW_NUMBER() function in any RDBMS. There is a much more efficient way to do this on MySQL with user variables.

Background

Please see my previous article on how to simulate the ROW_NUMBER() function for the background. I’ll use the same table structure and data in this article.

Unfortunately, that’s a quadratic algorithm, so it’s not something I’d do much (though I once did it over small sets of data in SQL Server 2000 at a jobsite).

A more efficient method

In MySQL you can simultaneously assign to and read from user variables in a SELECT statement. This allows the following method of numbering rows:

set @type = '';
set @num  = 1;

select
   type,
   variety,
   @num := if(@type = type, @num + 1, 1) as row_number,
   @type := type as dummy
from fruits;

+--------+------------+------------+--------+
| type   | variety    | row_number | dummy  |
+--------+------------+------------+--------+
| apple  | fuji       |          1 | apple  | 
| apple  | gala       |          2 | apple  | 
| apple  | limbertwig |          3 | apple  | 
| cherry | bing       |          1 | cherry | 
| cherry | chelan     |          2 | cherry | 
| orange | navel      |          1 | orange | 
| orange | valencia   |          2 | orange | 
| pear   | bartlett   |          1 | pear   | 
| pear   | bradford   |          2 | pear   | 
+--------+------------+------------+--------+

How does that work? I’m restarting the row number each time the type column changes, by keeping track of the value it had in the last row. And I’m simultaneously incrementing and selecting the row number in each row.

The spurious dummy column has to be there, but if your version of MySQL supports it, you can use a subquery in the FROM clause to eliminate columns you don’t want in the results.

Efficiency

All I’m doing is maintaining a bit of extra memory and performing a few small comparisons and assignments for each row, so this technique is very efficient.

Playing with fire

You can refer to the generated row_number column in a HAVING or GROUP BY clause, but don’t burn your fingers. This technique is very much like playing with fire. The result of assigning to a variable and using it in the same statement (in the HAVING, for example) depends on the query plan the server chooses, the phase of the moon, and probably other things too. Before you use this technique, you should read and understand the section on user-defined variables in the MySQL Manual, and decide whether it’s safe for your query.

Now that you’ve read that section of the manual, particularly the part about the aliased expression, you should understand why the following query might be a safer paradigm when using the result in the HAVING clause, even though it produces another dummy column:

set @type = '';
set @num  = 1;

select
   type,
   variety,
   @num := if(@type = type, @num + 1, 1) as dummy_1,
   @type := type as dummy_2,
   @num as row_number
from fruits
group by type, variety
having row_number = 1;

+--------+----------+---------+---------+------------+
| type   | variety  | dummy_1 | dummy_2 | row_number |
+--------+----------+---------+---------+------------+
| apple  | fuji     |       1 | apple   | 1          | 
| cherry | bing     |       1 | cherry  | 1          | 
| orange | navel    |       1 | orange  | 1          | 
| pear   | bartlett |       1 | pear    | 1          | 
+--------+----------+---------+---------+------------+

(If I’m wrong about that, somebody please correct me).

A safer technique is to use a subquery in the FROM clause. This will cause the results to be materialized in a temporary table behind the scenes. It might be less efficient for some uses, though:

select type, variety
from (
   select
      type,
      variety,
      @num := if(@type = type, @num + 1, 1) as row_number,
      @type := type as dummy
   from fruits
) as x
where row_number = 1;

+--------+----------+
| type   | variety  |
+--------+----------+
| apple  | fuji     | 
| cherry | bing     | 
| orange | navel    | 
| pear   | bartlett | 
+--------+----------+

Conclusion

This is an efficient, flexible way to generate and use row numbers in MySQL. I’ll leave it to you to find uses for it for right now, but I’m going to show you at least one application for this in an upcoming article.

Technorati Tags:, ,

You might also like:

  1. How to simulate the SQL ROW_NUMBER function
  2. How to select the first/least/max row per group in SQL
  3. Advanced MySQL user variable techniques
  4. How to select the first or last row per group in SQL
  5. How to simulate the GROUP_CONCAT function

16 Responses to “How to number rows in MySQL”


  1. 1 nah

    “There is a much more efficient way to do this on MySQL”.

    Pfff … as always, MySQL fan-boys don’t give any fact about the supossed ‘efficiency’ of their ‘magic sql’.

    Anyway, it’s funny. Those fan-boys seem to discover, every day, something ‘hyper-mega-cool’ about RDBMS (well, if you consider MySQL like a RDBMS, of course, LOL).

  2. 2 Xaprb

    You make a very good point… I don’t provide any hard numbers about efficiency. And I’m the one who says you should always do real measurements :-) It’s impossible to say much about query performance on such a small data set, so I added another 100 rows for each type of fruit:

    insert into fruits(type, variety)
       select type, concat(variety, i)
    from fruits
       cross join mysql.number
    where i < 100;

    Now the queries perform differently enough to appreciate the linear vs. quadratic algorithms. This is with a totally in-memory dataset; if it were too large to fit in memory, it would be even clearer:

    +----------------------------------------------------------+
    |                   QUERY 1 (0.1816 sec)                   |
    +----------------------------------------------------------+
    select l.type, l.variety, count(*) as num from fruits as ...
    
    __ Overall stats ___________________________________________
       Elapsed time                              0.182
       Optimizer cost                        11180.424
    
    __ Table and index accesses ________________________________
       Table locks acquired                      2
       Table scans                               1
    
    __ Row operations ____________________ Handler
       Reads                                211801
         Fixed pos (might be sort)               0
         Next row (table scan)                   0
         Bookmark lookup                       900
         First in index (full scan?)             1
         Next in index                      210900
         Prev in index                           0
    
    __ I/O Operations _____________________ Memory ________ Disk
       Key cache
         Key reads                           11645             0
    
    +----------------------------------------------------------+
    |                   QUERY 3 (0.0015 sec)                   |
    +----------------------------------------------------------+
    select type, variety, @num := if(@type = type, @num + 1, ...
    
    __ Overall stats ___________________________________________
       Elapsed time                              0.002
       Optimizer cost                          186.578
    
    __ Table and index accesses ________________________________
       Table locks acquired                      1
       Table scans                               1
    
    __ Row operations ____________________ Handler
       Reads                                   901
         Fixed pos (might be sort)               0
         Next row (table scan)                   0
         Bookmark lookup                         0
         First in index (full scan?)             1
         Next in index                         900
         Prev in index                           0
    
    __ I/O Operations _____________________ Memory ________ Disk
       Key cache
         Key reads                              45             0
  3. 3 Sheeri

    In the last article you specified it, and it’s worth repeating here — you want to order by type, because the numbering restarts at any change — it doesn’t magically know the count for “apple” or “cherry”, it just resets on a change.

  4. 4 Xaprb

    That is a good point. The rows are numbered in the order they come back to the client, which is precisely the thing that’s hard to do without this solution. If you want the rows in a different order, you just say ORDER BY whatever, and they’re numbered in the order they’re returned.

  5. 5 Nicholas Thompson

    VERY nice article! You just saved me hours (well at least many minutes) of tedius work. I adapted the incrementing variable technique you used above to reorder the weights of menu items in Drupal (an Open Source CMS) so the items appears alphabetically…

    I posted how I did it on my blog Ordering Menu Items Alphabetically in Drupal.

  6. 6 Unomi

    To just count the rows incremently use something like this:

    SELECT a.*, @num := @num + 1 b from test a, (SELECT @num := 0) d;

    This results in having table ‘a’ with an additional column ‘b’ wich just counts the rows. Setting @num in the sub-selection makes it simple to put it into 1 query. The incrementing number can be adjusted.

    After the ‘d’ (wich stands for derived and it is mandatory to have an alias), you can order, group by etc. the column ‘b’ still adds up.

    Hope this helps people searching for a row number solution……

    - Unomi -

  7. 7 Xaprb

    That’s a great solution.

  8. 8 Alex

    Anyone have a solution for SQLite? It doesn’t have user-defined variables, sadly.

  9. 9 Rafael Vale

    Yo there!
    Your posts helps me very much, thank u for that!
    I have a little problem and probably somebody here can help me!
    I need to select the entries of a table. But i want only the MAX(id) from each product_id of that table.
    i’m trying:
    SELECT max(id), id_pit, id_atend FROM table GROUP BY id_pit

    but the result is the MAX id with the id_atend from another id.

    How can i do it?

    Thanks for atention!

  10. 10 Xaprb

    You need to read two of my other articles: the “next” article linked from the top of this page, on finding the first row per group, and this one about incorrect GROUP BY, which you are doing and is causing you trouble.

  11. 11 Nick T.

    I’m a new MySQL/PHP user, so please bear with me. The information you provided, Xaprb, is excellent. But is there a way I can make the row_number column a permanent column in my database instead of just output?

    I have a table very similar to that of “fruits”, with several broad “types” and specific “varieties” within those “types”. I need to add another column numbering these types but the table is just too big to do it by hand. The solution you mentioned above is great for output, but how can I add the column into the permanent table itself?

    I tried to make a MySQL dump file of the query you gave above, and then import that into my table using PHPMyAdmin, but I kept getting errors. Is there another way?

  12. 12 Xaprb

    The simplest way is to make a multiple-column primary key on (type, variety, ai) where ai is an auto_increment integer. This only works on MyISAM, though.

  13. 13 rackoon

    If I’m not mistaken, there is no guarantee that this method works. In the mysql-manual it says:

    —SNIP—

    The order of evaluation for user variables is undefined and may change based on the elements contained within a given query. In SELECT @a, @a := @a 1 …, you might think that MySQL will evaluate @a first and then do an assignment second, but changing the query (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may change the order of evaluation.

    The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. You might get the results you expect, but this is not guaranteed.

    —SNIP—

    So how do you know that in your first example “if(@type = type, @num 1, 1)” is evaluated before you reassign @type via “@type := type”?

  14. 14 Xaprb

    The manual is vague and inaccurate about this point. You have to read the source code and learn how the MySQL query executioner (yes, it’s actually called that!) works.

  15. 15 tadwook

    thanks rackoon — THIS METHOD DOES NOT WORK WITH order by — just FYI

  16. 16 Allison

    Thanks, Unomi! Your solution works great for me… I was looking for an alternative to SQL Server 2005’s row_number() function.

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the Sourceforge forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead. I'm writing a book and my time is extremely limited :-)