I wrote before about a generic, cross-platform way to simulate the SQL ROW_NUMBER() function in any RDBMS. There is a much more efficient way to do this on MySQL with user variables.
Background
Please see my previous article on how to simulate the ROW_NUMBER() function for the background. I’ll use the same table structure and data in this article.
Unfortunately, that’s a quadratic algorithm, so it’s not something I’d do much (though I once did it over small sets of data in SQL Server 2000 at a jobsite).
A more efficient method
In MySQL you can simultaneously assign to and read from user variables in a SELECT statement. This allows the following method of numbering rows:
set @type = ''; set @num = 1; select type, variety, @num := if(@type = type, @num + 1, 1) as row_number, @type := type as dummy from fruits; +--------+------------+------------+--------+ | type | variety | row_number | dummy | +--------+------------+------------+--------+ | apple | fuji | 1 | apple | | apple | gala | 2 | apple | | apple | limbertwig | 3 | apple | | cherry | bing | 1 | cherry | | cherry | chelan | 2 | cherry | | orange | navel | 1 | orange | | orange | valencia | 2 | orange | | pear | bartlett | 1 | pear | | pear | bradford | 2 | pear | +--------+------------+------------+--------+
How does that work? I’m restarting the row number each time the type column changes, by keeping track of the value it had in the last row. And I’m simultaneously incrementing and selecting the row number in each row.
The spurious dummy column has to be there, but if your version of MySQL supports it, you can use a subquery in the FROM clause to eliminate columns you don’t want in the results.
Efficiency
All I’m doing is maintaining a bit of extra memory and performing a few small comparisons and assignments for each row, so this technique is very efficient.
Playing with fire
You can refer to the generated row_number column in a HAVING or GROUP BY clause, but don’t burn your fingers. This technique is very much like playing with fire. The result of assigning to a variable and using it in the same statement (in the HAVING, for example) depends on the query plan the server chooses, the phase of the moon, and probably other things too. Before you use this technique, you should read and understand the section on user-defined variables in the MySQL Manual, and decide whether it’s safe for your query.
Now that you’ve read that section of the manual, particularly the part about the aliased expression, you should understand why the following query might be a safer paradigm when using the result in the HAVING clause, even though it produces another dummy column:
set @type = ''; set @num = 1; select type, variety, @num := if(@type = type, @num + 1, 1) as dummy_1, @type := type as dummy_2, @num as row_number from fruits group by type, variety having row_number = 1; +--------+----------+---------+---------+------------+ | type | variety | dummy_1 | dummy_2 | row_number | +--------+----------+---------+---------+------------+ | apple | fuji | 1 | apple | 1 | | cherry | bing | 1 | cherry | 1 | | orange | navel | 1 | orange | 1 | | pear | bartlett | 1 | pear | 1 | +--------+----------+---------+---------+------------+
(If I’m wrong about that, somebody please correct me).
A safer technique is to use a subquery in the FROM clause. This will cause the results to be materialized in a temporary table behind the scenes. It might be less efficient for some uses, though:
select type, variety
from (
select
type,
variety,
@num := if(@type = type, @num + 1, 1) as row_number,
@type := type as dummy
from fruits
) as x
where row_number = 1;
+--------+----------+
| type | variety |
+--------+----------+
| apple | fuji |
| cherry | bing |
| orange | navel |
| pear | bartlett |
+--------+----------+
Conclusion
This is an efficient, flexible way to generate and use row numbers in MySQL. I’ll leave it to you to find uses for it for right now, but I’m going to show you at least one application for this in an upcoming article.
Technorati Tags:mysql, sql, user defined variables
“There is a much more efficient way to do this on MySQL”.
Pfff … as always, MySQL fan-boys don’t give any fact about the supossed ‘efficiency’ of their ‘magic sql’.
Anyway, it’s funny. Those fan-boys seem to discover, every day, something ‘hyper-mega-cool’ about RDBMS (well, if you consider MySQL like a RDBMS, of course, LOL).
You make a very good point… I don’t provide any hard numbers about efficiency. And I’m the one who says you should always do real measurements :-) It’s impossible to say much about query performance on such a small data set, so I added another 100 rows for each type of fruit:
Now the queries perform differently enough to appreciate the linear vs. quadratic algorithms. This is with a totally in-memory dataset; if it were too large to fit in memory, it would be even clearer:
+----------------------------------------------------------+ | QUERY 1 (0.1816 sec) | +----------------------------------------------------------+ select l.type, l.variety, count(*) as num from fruits as ... __ Overall stats ___________________________________________ Elapsed time 0.182 Optimizer cost 11180.424 __ Table and index accesses ________________________________ Table locks acquired 2 Table scans 1 __ Row operations ____________________ Handler Reads 211801 Fixed pos (might be sort) 0 Next row (table scan) 0 Bookmark lookup 900 First in index (full scan?) 1 Next in index 210900 Prev in index 0 __ I/O Operations _____________________ Memory ________ Disk Key cache Key reads 11645 0 +----------------------------------------------------------+ | QUERY 3 (0.0015 sec) | +----------------------------------------------------------+ select type, variety, @num := if(@type = type, @num + 1, ... __ Overall stats ___________________________________________ Elapsed time 0.002 Optimizer cost 186.578 __ Table and index accesses ________________________________ Table locks acquired 1 Table scans 1 __ Row operations ____________________ Handler Reads 901 Fixed pos (might be sort) 0 Next row (table scan) 0 Bookmark lookup 0 First in index (full scan?) 1 Next in index 900 Prev in index 0 __ I/O Operations _____________________ Memory ________ Disk Key cache Key reads 45 0In the last article you specified it, and it’s worth repeating here — you want to order by type, because the numbering restarts at any change — it doesn’t magically know the count for “apple” or “cherry”, it just resets on a change.
That is a good point. The rows are numbered in the order they come back to the client, which is precisely the thing that’s hard to do without this solution. If you want the rows in a different order, you just say ORDER BY whatever, and they’re numbered in the order they’re returned.
VERY nice article! You just saved me hours (well at least many minutes) of tedius work. I adapted the incrementing variable technique you used above to reorder the weights of menu items in Drupal (an Open Source CMS) so the items appears alphabetically…
I posted how I did it on my blog Ordering Menu Items Alphabetically in Drupal.
To just count the rows incremently use something like this:
This results in having table ‘a’ with an additional column ‘b’ wich just counts the rows. Setting @num in the sub-selection makes it simple to put it into 1 query. The incrementing number can be adjusted.
After the ‘d’ (wich stands for derived and it is mandatory to have an alias), you can order, group by etc. the column ‘b’ still adds up.
Hope this helps people searching for a row number solution……
- Unomi -
That’s a great solution.
Anyone have a solution for SQLite? It doesn’t have user-defined variables, sadly.
Yo there!
Your posts helps me very much, thank u for that!
I have a little problem and probably somebody here can help me!
I need to select the entries of a table. But i want only the MAX(id) from each product_id of that table.
i’m trying:
SELECT max(id), id_pit, id_atend FROM table GROUP BY id_pit
but the result is the MAX id with the id_atend from another id.
How can i do it?
Thanks for atention!
You need to read two of my other articles: the “next” article linked from the top of this page, on finding the first row per group, and this one about incorrect GROUP BY, which you are doing and is causing you trouble.
I’m a new MySQL/PHP user, so please bear with me. The information you provided, Xaprb, is excellent. But is there a way I can make the row_number column a permanent column in my database instead of just output?
I have a table very similar to that of “fruits”, with several broad “types” and specific “varieties” within those “types”. I need to add another column numbering these types but the table is just too big to do it by hand. The solution you mentioned above is great for output, but how can I add the column into the permanent table itself?
I tried to make a MySQL dump file of the query you gave above, and then import that into my table using PHPMyAdmin, but I kept getting errors. Is there another way?
The simplest way is to make a multiple-column primary key on (type, variety, ai) where ai is an auto_increment integer. This only works on MyISAM, though.
If I’m not mistaken, there is no guarantee that this method works. In the mysql-manual it says:
—SNIP—
The order of evaluation for user variables is undefined and may change based on the elements contained within a given query. In SELECT @a, @a := @a 1 …, you might think that MySQL will evaluate @a first and then do an assignment second, but changing the query (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may change the order of evaluation.
The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. You might get the results you expect, but this is not guaranteed.
—SNIP—
So how do you know that in your first example “if(@type = type, @num 1, 1)” is evaluated before you reassign @type via “@type := type”?
The manual is vague and inaccurate about this point. You have to read the source code and learn how the MySQL query executioner (yes, it’s actually called that!) works.
thanks rackoon — THIS METHOD DOES NOT WORK WITH order by — just FYI
Thanks, Unomi! Your solution works great for me… I was looking for an alternative to SQL Server 2005’s row_number() function.