Xaprb

Stay curious!

How to implement a queue in SQL

with 21 comments

This article explains how to create a fixed-size FIFO (first-in, first-out) queue in SQL, where rows added after a threshold will cause the oldest row to be deleted. There are several ways to do this, but MERGE on Oracle and DB2, and MySQL’s non-standard extensions to SQL, make an elegant solution easy.

Update a PostgreSQL blogger pointed out the obvious method I missed: triggers! There’s also a really neat PG-specific feature that allows it to work even more elegantly on that platform. Well worth a read. I sometimes wish I worked at a PostgreSQL shop so I could have time to learn as much about it as I’ve learned about MySQL.

Method one: do it with a single query in MySQL

Since I’m most familiar with MySQL, I’ll explain it in detail for MySQL. I’m sure a competent Oracle or DB2 developer can translate it to those platforms.

It is not possible to simultaneously INSERT and DELETE in standard SQL. However, in MySQL it is possible to simultaneously INSERT and UPDATE, with the ON DUPLICATE KEY UPDATE syntax. Another way to do it is with REPLACE, which actually works as a DELETE and INSERT. I’ve written about these before (flexible INSERT and UPDATE statements in MySQL — one of my most popular articles). You may not like them because they’re non-standard, but they’re available. I believe in using what my tools give me.

For these queries to work, you need to let inserts proceed as normal until the limit (say, 5) is reached. After that, new inserts need to create a unique index violation, and then the two-things-at-once functionality of the statement kicks in.

There are probably many ways to do this, but for this article, I’m going to imagine the table as a fixed-size, fixed-order queue. Once the list is full, new inserts “wrap around” and begin again at the bottom, travelling up through the rows one at a time. Each new insert then over-writes an existing row until it reaches the top and wraps around again.

For simplicity’s sake, I’m also going to imagine that nothing ever deletes any rows from this table. That way there’ll be no gaps in the sequence I’ll use to make the wrap-around work.

Here’s the table definition:

CREATE TABLE q (
   id int NOT NULL,
   modulo int NOT NULL,
   fruit varchar(10) NOT NULL,
   PRIMARY KEY(id),
   UNIQUE KEY(modulo)
)

This table has two unique keys: one to serve as a monotonically increasing “row number,” and one to cause the wrap-around effect to work. The only real data is the fruit column. Here’s a query to insert “apples” into the queue:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'apples'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit)

Here’s what the query does: it finds the maximum value of id in the table, which ought to be efficient since it’s indexed. If there are no rows, the result will be NULL, which COALESCE() converts into -1. Then it adds one to that value, which will become the next largest value in the id sequence. What I’m really doing here is rolling my own AUTO_INCREMENT, with a slight twist: the sequence starts at zero, not one.

The sequence needs to start at zero to make the modulo arithmetic easy to understand and work with. At the same time I’m inserting that value into id, I’m also dividing it by the desired size of the table, and inserting the remainder into the modulo column. When the table gets “full,” that column will already contain the calculated value, and there’ll be a unique index violation. Then the ON DUPLICATE KEY UPDATE clause will fire and update the existing row instead of inserting a new one.

Here’s what the table contains after the above insert:

select * from q;
+----+--------+--------+
| id | modulo | fruit  |
+----+--------+--------+
|  0 |      0 | apples |
+----+--------+--------+

Let me now insert four more rows for oranges, peaches, cherries and pears, so the queue is full:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'oranges'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'peaches'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'cherries'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'pears'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

Each row I inserted caused MySQL to print the following back to the command prompt:

Query OK, 1 row affected (0.05 sec)
Records: 1  Duplicates: 0  Warnings: 0

And now, the contents of the table:

select * from q;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  0 |      0 | apples   |
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
+----+--------+----------+

Now I’ll insert another row for bananas:

insert into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'bananas'
   from q
      on duplicate key update
         id    = values(id),
         fruit = values(fruit);

That query should have wrapped around to the beginning of the queue and triggered the unique index violation. As a result, MySQL should have overwritten the ‘apples’ row. In fact, the messages at the command prompt indicate something did happen:

Query OK, 2 rows affected (0.03 sec)
Records: 1  Duplicates: 1  Warnings: 0

Two rows were “affected” because of the duplicate key (you can read the MySQL manual for more on what “rows affected” really means). And there was indeed a duplicate row. Here’s what’s in the table now:

select * from q order by modulo;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  5 |      0 | bananas  |
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
+----+--------+----------+

Notice I ordered that query by modulo to show the entries in the same order as before. The “oldest” row, which is at the “front” of the queue, is now the one with the smallest value of id, so to see them “in queue order,” you can order by id:

select * from q order by id;
+----+--------+----------+
| id | modulo | fruit    |
+----+--------+----------+
|  1 |      1 | oranges  |
|  2 |      2 | peaches  |
|  3 |      3 | cherries |
|  4 |      4 | pears    |
|  5 |      0 | bananas  |
+----+--------+----------+

Method two: use REPLACE on MySQL

If it’s easier to write your query this way, or you need support on older versions of MySQL, you can use REPLACE instead of INSERT... ON DUPLICATE KEY UPDATE. Here’s an example:

replace into q(id, modulo, fruit)
   select
      (coalesce(max(id), -1) + 1),
      (coalesce(max(id), -1) + 1) mod 5,
      'bananas'
   from q;

The query may be more or less efficient, depending on your MySQL version, the storage engine you chose, and so forth. If I were doing this in production, I’d test it.

Methods three and four: on other platforms

Another option, which will allow the same easy single-query solution, is to use MERGE on Oracle and DB2. MERGE, REPLACE and friends are what we database folks call upsert queries, because they insert or update. If this functionality is available on other platforms, let me know. It looks like it’s still on the TODO list for PostgreSQL, and I’m fairly certain it’s not in SQL Server 2005. Perhaps a future release of one of these products will offer it.

Till then, I think the best option on these platforms would be a transaction with a couple statements to check the table and either insert or update (or delete and then insert, depending on how you want to do it). This fourth method can be made completely portable across platforms, if that’s important for your use case.

Things to think about

If you implement a system like this, consider the edge cases. Are you ever going to delete rows from the queue? If so, does that mess with the desired behavior of new inserts? Are there any ways you can get a hole in the sequence? If so, what happens — do you get too few rows in the queue, overwrite something other than the oldest row, or something else? If you need to “process” items in the queue, maybe you can just mark them as processed rather than deleting them.

What if you want to insert multiple rows at once? If you need to go that route on MySQL, my past articles might help. You could use advanced user variable techniques to number several rows at once. I also talked about related techniques in how to write INSERT IF NOT EXISTS queries.

Conclusion

In this article I showed you several ways to let a table grow to a fixed size, after which new rows replace old rows. I haven’t personally used this in production; this article grew out of a reader’s question (thanks for the stimulating topic!). If this article helped you, you should consider subscribing for future updates via email or feeds. It’s free and convenient.

Written by Xaprb

January 11th, 2007 at 3:45 pm

Posted in Uncategorized

Tagged with , , , , ,

21 Responses to 'How to implement a queue in SQL'

Subscribe to comments with RSS

  1. Hi, Baron.

    As a side note, the MySQL General Purpose Routine Library has provision for queues (as well as lists, stacks, associative arrays).

    Giuseppe

    Giuseppe Maxia

    11 Jan 07 at 5:45 pm

  2. A similar, though somewhat less elegant, approach: Staging Table.

    Scott Noyes

    11 Jan 07 at 6:02 pm

  3. Giuseppe, Scott: thanks! Quite different approaches — Scott, I think in some cases that trick with the update-in-a-join is very elegant for certain problems. (I can’t think of an example, I just have a gut feeling about it).

    Xaprb

    11 Jan 07 at 6:07 pm

  4. Baron, you are the man!

    This is exactly the kind of elegant solution I was hoping existed. Now we can replace our nasty hack that we have in place. Thanks again for writing this article. You really helped us out here.

    And thanks for mentioning me in the article. You make me feel special… ;)

    -E

    Eric Ryan Harrison

    12 Jan 07 at 9:43 am

  5. I’m quite hesitant to say what I want to say, and actually I feel stupid, because I might have missed something here, I’m no MySQL expert but why don’t you make it really simple!

    If I want to implement a dirty queue, I would just insert records with a timestamp, and simply process and delete the one with the smallest timestamp value (because) it was the first in. Of course, that wouldn’t be a fixed length queue. Wouldn’t this solution be faster as well? you have only two very simple queries on an Indexed column (timestamp).

    * For adding items you would just INSERT.

    * To get the first in, you would SELECT, then DELETE

    Ammar Ibrahim

    13 Jan 07 at 12:06 am

  6. Ammar, you are right, for most cases you just use your table as a normal table. The question I got was for a particular case of having it automatically restrict the queue to the N most recent items, so no manual deletion was required. It’s a special case, and I actually don’t know why it would be necessary. I was just responding to a reader inquiry, because I thought it was an interesting problem. Thanks for reading!

    Xaprb

    13 Jan 07 at 9:37 am

  7. Your solution is better than triggers…. here’s what I wrote in the postgres blog:

    Triggers are great, and I hadn’t realized Xarpb hadn’t thought of them. I thought he had, but figured that triggers were a worse performance hit, because then you end up doing another query.

    I’m guessing rules have the same thing — you have to do another query. You could also make a stored procedure called addToQueue or something, which has the INSERT, then “DELETE older rows” functionality. But it still does it in many query calls, whereas Xarpb’s solution does it in one.

    Sheeri

    14 Jan 07 at 1:48 pm

  8. Hi Sheeri. Thanks for writing in. One reason I didn’t think of triggers is that I never think of triggers: they are invisible, behind-the-scenes behavior that people don’t expect, and that scares me; it makes the system harder to understand and maintain. I don’t use them generally. True, they have their uses, but at the moment I don’t have any in production. So that’s probably why they didn’t come to mind.

    I am not sure about the performance hit. Behind the scenes, of course MySQL is doing several HANDLER operations wrapped up in that single query (read, update/insert, maybe more too). I would have to defer to a source-code expert as to whether this is less or more efficient. Anyone have a comment on that?

    I found it a very fun hack though, and apparently others did too, which makes me happy!

    Xaprb

    14 Jan 07 at 6:06 pm

  9. [...] Xarpb, for his article on how to implement a queue in SQL: [...]

  10. As the initial commenter who requested this article to be written, I see that you guys are having a hard time understanding the actual reason that we want this solution.

    We are using this solution to track and store query times for our users for each application we provide (mostly large database query applications). We are storing the ten most recent query times and then displaying the average query time (for each tool) when a query is made. We are doing this just to give the user a heads up regarding how long the queries are taking. Hence why we want to have a circular table like this.

    Originally, we WERE merely doing a dual insert/delete operation, but I posted the question here to Baron to see if MySQL had a more efficient way to do it in one line (which he beautifully provided).

    Does that clear this up for you guys?

    -E

    Eric Ryan Harrison

    16 Jan 07 at 10:39 am

  11. A nice article. I think the most interesting use case for a query like this would be to implement a RRD/MRTG kind of database, where (network performance) data comes in every x seconds, and you only need a weeks worth of data.

    I don’t know though why people say it is an “elegant” solution. A really elegant solution would not change the way the database is accessed. Why should I change the way I insert things in a database? I should just make standard queries and the database should take care of the rest, as happens with RRDtool.

    If more than one person is interfacing with a database like this, what is the probability that somebody forgets/gets this query wrong and messes up things? I guess mysql was not made for stuff like this.

    I nice blog nevertheless!!

    Thanos

    29 Jan 07 at 5:52 am

  12. If I were worried that people would query the table wrong, I’d restrict access via a stored procedure instead. But in real life, there is no such thing as a table that only allows you to insert things the right way. You can always insert something that violates business rules or makes something break in an application of any complexity.

    Xaprb

    29 Jan 07 at 8:43 am

  13. Another edge case — reaching the limit of the id field.

    If your application does 100 of these per second, an UNSIGNED INT will last 1.36 years — not so great. You should use an UNSIGNED INT because you’re never going to have a negative number, and a signed int would only last just over 8 months if there were 100 REPLACEs per second.

    (60 seconds/min, 60 min/hr, 24 hrs/day, 365 days/yr)
    4294967295/60/60/24/365/100=1.3619251950

    So, for scaling/known high traffic, use a BIGINT. However, in this case, do NOT use UNSIGNED, as all MySQL arithmetic is done with signed bigints or doubles. Not that it matters in this case; at 100 REPLACEs per second, you will wrap at 2.9 billion years:

    9223372036854775807/60/60/24/365/100=2,924,712,086.7753601074

    Let’s say your system does 10,000 of these REPLACEs per second, for eternity (our main database system, where we’re about to use this, average 6,000 qps, not all writes, but it’s a good figure to use for our own numbers) — move the decimal places a few spots over and you’re down to running out of numbers in 29 million years.

    That’s an OK limit for us. :)

    Anonymous

    5 Feb 07 at 4:15 pm

  14. Thanks for mentioning that BIGINT and UNSIGNED. I assume that in 29 million years, MySQL will have a HUGEINT type, too ;-)

    Xaprb

    6 Feb 07 at 9:10 am

  15. Note, this idea of a ‘queue’ is actually pretty similar to a round-robin database (RRD). Most of you are probably familiar with RRDTool. Here’s another article on the topic: Round-Robin Database in MySQL.

    Xaprb

    5 Mar 07 at 11:25 am

  16. What about using a certain timeframe for the fifo mechanism. E.g. when the timestamp between the insert of a new item is more than 60 days later than the timestamp of the first item, it replaces the first item.

    What would this mean to the insert query?

    Ivo

    30 Mar 07 at 4:27 am

  17. It might work to use a calculation like TO_DAYS(CURRENT_DATE) % 60, or some variation thereof.

    Xaprb

    30 Mar 07 at 9:19 am

  18. Great article! How do I ensure unique value in fruit column, say no other rows should have ‘peaches’ if it is already in queue

    Anish

    24 Jul 07 at 2:23 am

  19. [...] How to implement a queue in SQL at Xaprb This article explains how to create a fixed-size FIFO (first-in, first-out) queue in SQL, where rows added after a threshold will cause the oldest row to be deleted. There are several ways to do this, but MERGE on Oracle and DB2, and MySQL’s non-standard extensions to SQL, make an elegant solution easy. [...]

  20. Interesting approaches, but most of them have the problem to lock the table on concurrent access. If one takes InnoDB as table type, the performance of MAX() falls by the wayside and using MyISAM would bring the aforementioned concurrency problem when reading and writing in parallel. I use a modified approach by Robert Eisele as a my queue logic, which outperforms the presented approaches: http://www.xarg.org/2009/09/fast-circular-buffer-in-mysql/

    Thomas

    Thomas

    28 Jun 10 at 9:52 am

  21. Glancing at it briefly, I think it’ll suffer under high traffic. But, I believe, so will my original post above — I wrote this at a time when I didn’t have that much experience with queues in MySQL. Since then I’ve seen that it is a hard problem that rarely scales well.

    Xaprb

    29 Jun 10 at 8:52 am

Leave a Reply