My wishlist for SQL: the UNTIL clause
I’d like an UNTIL clause, please. I’d use it sort of like LIMIT in MySQL and PostgreSQL, except that it would define when to stop returning looking for rows, instead of defining how many to return. Example:
SELECT * FROM users ORDER BY user_id UNTIL user_id >= 100;
That would select users up to and including user 99. Ideally the clause could accept any boolean predicate, including subqueries. I’ll hold my breath and wait for this wish to come true now.


Wouldn’t you still need an ORDER BY?
Greg Sabino Mullane
22 Jan 10 at 7:43 pm
Why not simply write
SELECT *
FROM users
WHERE user_id < 100
Hans
22 Jan 10 at 8:12 pm
Or you can use a stored procedure, use a cursor and a temporary table?
Antony Curtis
22 Jan 10 at 9:05 pm
its useful, even though user id is a bad example.. but the results are dictated by the ORDER BY clause as done in the case of LIMIT
this is especially useful for steaming data, but stored procedure already supports REPEAT-UNTIL condition
Venu
22 Jan 10 at 9:24 pm
think can see the utility of this, but a better example query would be good, as Hans says this is trivially solved.
One I can think of is
SELECT * FROM items UNTIL COUNT(DISTINCT user_id) = 100;
which would give me 100 users worth of items, and as there is no group by could be multiple per user. (and yes I know this query could be easily written as a subquery)
Barry
22 Jan 10 at 9:28 pm
I’m with Hans
Alex
22 Jan 10 at 9:29 pm
ok, maybe
SELECT * FROM products UNTIL SUM(value) > 100;
is a better example, less easily solved otherwise?
Barry
22 Jan 10 at 9:33 pm
Barry, you still need an ORDER BY or that’s an indeterministic pull and summing. It’s also very easily solved by a stored procedure.
Greg Sabino Mullane
22 Jan 10 at 10:01 pm
There are always ways to do something like this, though I generally don’t like cursors and stored procedures. (Kind of hard to put those into a subquery if you wanted to do that, depending on the platform.) Although in a lot of cases you can use a WHERE clause for this type of functionality, sometimes this would be a nicer semantic sugar. And yes you’d need an ORDER BY unless you’re willing to gamble on the order the rows come back.
I would go out on a limb here and say that LIMIT, OFFSET was similarly “not needed” but then after it was provided, became much loved. You can do it in SQL Server 2000 with nested subqueries and two reverse sorts and two TOP clauses, so you “don’t need it”. But it sure isn’t as nice. Betcha if we had an UNTIL clause a lot of people would use it.
Xaprb
22 Jan 10 at 10:35 pm
Use CTE, and switch to PostgreSQL luke ;)
gregj
23 Jan 10 at 2:58 am
I am really not understanding here. This proposal seems to assume that a result set has some defined ordering, sans an ORDER BY clause. It is not.
A result set does not have to come back in the same order that the rows were inserted into the table. They don’t even have to come back in the same ordering twice.
This UNTIL clause would give inconsistant results, because of this fact.
Mark Atwood
23 Jan 10 at 3:36 am
I don’t really see the use of this, and how this is different from a WHERE clause. The examples above caneither use a WHERE clause or contains a mix of two different data sets (i.e. an AGGREGATE mixed with a non-aggregate as in the SUM and COUNT examples).
There are some uses for mixed sets of data sets,but I think the above examples are not good enough,except possibly the SUM example above.
Also I have to admit that I find the LIMIT clause without an ORDER BY to be a rather annoying construct, useful yes, in-line with relational thinking,not so much.
Anders Karlsson
23 Jan 10 at 6:16 am
To be honest I considered the lack of ORDER BY just for brevity in the example, and I followed that, a real usage would have a ORDER BY – as Xaprb reply confirms. (In fact just like LIMIT – you mostly use a ORDER BY with it)
Barry
23 Jan 10 at 8:39 am
I think C.J. Date would not approve ;)
But it would be a useful addition to the syntax; though I agree would mostly make sense with ORDER BY.
Regards
Shlomi Noach
23 Jan 10 at 10:38 am
I’ve updated the post to include the ORDER BY. Leaving that off was an oversight.
Date would definitely not approve. I’ll have to think of a really good example where this would help. The problem is, they are complex. They usually have to be solved with nasty subqueries. These are often the kinds of things I use user-variables for in MySQL. (Anders, do you find @variables useful?)
It is a bit frustrating to me that I’ve so often wished for a way to say “stop looking for more rows now” but I can’t think of the examples anymore. There is a difference between “stop looking for more rows” and “filter out rows that don’t match.”
Xaprb
23 Jan 10 at 12:14 pm
“stop looking for more rows” could always be achieved by using e.g. row_number() or LIMIT in a sub-select
I still don’t see the point of the UNTIL keyword.
What additional features should have compared to the existing LIMIT/row_number() solutions
Hans
23 Jan 10 at 12:49 pm
To be frank,although there are uses for the @variables, I don’t like them. I believe in rather strict separation of what goes into the database and what goes into application code. One reason for this is SQL is declarative (although not always in a conistent manner) whereas something like C and Java is procedural. And I oppose stuff that contradicts this even in standard SQL. But I am also usually OK with other views on this, and in the end, I am also a pragmatist.
So even though I don’t like the LIMIT clause for example (if I want 10 rows, I fetch 10 rows in my application), if someone has anapplication that does the job and use the LIMIT clause, then so be it.
/Karlsson
Anders Karlsson
23 Jan 10 at 2:29 pm
@Hans,
With LIMIT you need to know the exact number of rows beforehand
@Hans, @Baron,
I can think of a few examples. For one: I want to select all the log entries which were prior to an error entry. I want these ordered chronologically. There’s an index on the timestamp which is going to be used. Sure, WHERE would work, but would make MySQL examine many more rows than it has to. UNTIL can be handy
@Andres,
How would you do paging without LIMIT (I’m interested what kind of solution that would be).
I’ve found using @variables can make for a performance boost. I think this is a strong factor.
Shlomi Noach
23 Jan 10 at 2:38 pm
@Shlomi
“With LIMIT you need to know the exact number of rows beforehand”
No, necessarily
You can do something like this (at least in Postgres)
SELECT *
FROM some_table
LIMIT (SELECT count(*) FROM other_table)
OFFSET (SELECT max(col) FROM third_table);
Hans
23 Jan 10 at 2:51 pm
@Hans,
Not in MySQL, I’m afraid ;)
And, actually, your example presents a compelling reason for having UNTIL. A three-SELECT query which can be solved with one seems reason enough.
Shlomi Noach
23 Jan 10 at 3:09 pm
Schlomi: What you are stating is that UNTIL would do the same thing as a WHERE clause, but optimized differently. I say this is wrong way to do things. If what you want to do CAN be expressed with current constructs (WHERE in this case), but would bebadly optimized, then this is a case of enhancing the optimizer. The declarative nature (this is what I want, not how) of SQL really should be preserved. And a WHERE clause doesn’t imply that an index is used, rather, it is a definition of what I want, and then it’s up to the optimizer to determine index usage.
As for paging:
First page: select and fetch 20 (or whatever) rows.
Second page: select, fetch and skip 20 rows, the display the next 20 rows.
Now, is this inefficient? In MySQL, yes, as resultsets are always materialized. But this is again an optimizer issue in my mind, i.e. if we weren’t materialzing, this approach would be no different than doing it in SQL. But also read my previous comments regarding pragmatism, and I think from that POV, LIMIT is in a grey zone.
Also, the way the MySQL Client library (again, materializing on the client side this time), this makes paging without LIMIT even more of an overhead. So in this case LIMIT helps fix a client API issue, which in my mind is not how SQL should be evolved. But this is my personal view.
/Karlsson
Anders Karlsson
23 Jan 10 at 3:10 pm
@Shlomi
> Not in MySQL, I’m afraid ;)
Another reason to move to Postgres :)
> And, actually, your example presents a
> compelling reason for having UNTIL.
> A three-SELECT query which can be solved with one
> seems reason enough.
Hmm, you wanted to have a way were you are not forced to know the number (of rows) beforehand. And a (sub-)query is the only way to dynamically retrieve a value in SQL. Everything else would be procedural code that is already possible anyway in Postgres (and MySQL)
Hans
23 Jan 10 at 3:23 pm
@Andres
“…What you are stating is that UNTIL would do the same thing as a WHERE clause, but optimized differently…”
Indeed, my meaning exactly. I am somewhat more flexible with declarative / non-declarative SQL. I mean, I don’t mind it that much if it’s not 100% declarative.
Paging: skipping (on application side) 100 rows to get the next 20 rows means network overhead, more time for locks held, buffers used.
I’m glad to compromise the delarativeness of SQL to avoid this overhead. Can you find an optimizer solution for that?
Regards
Shlomi Noach
23 Jan 10 at 3:28 pm
@Hans,
Right you are, but procedural code is so much more overhead to code and maintain, plus (with MySQL) it’s slower.
Ultimately, we always have to deal with limitations. The products we use are never perfect. We always need to patch this and that. I don’t find this too dirty to use a non-ANSI, non declarative syntax here and there to make things work better.
Regards
Shlomi Noach
23 Jan 10 at 3:32 pm
If this really IS an optimization, and not conceptually different from a WHERE caluse, i.e. the result would be the same as a WHERE clause but the optimization would be different, then I suggest that is added as a just that, an optimizer hint and not an extension to the base syntax. I.e. SELECT … FROM … WHERE SQL_UNTIL … or something like that.
Also, as far as these optimizations go, note that not all engines are equal, and not all optimizations are valid in all engines (for example the non-index ones are different). Doing it this was, and engine that handles push-down conditions could ignore the UNTIL clause if it was not valid.
And again, this assumes that the UNTIL clause is just an optimization, which I’m not sure of was the original intent.
Anders Karlsson
25 Jan 10 at 5:37 am
Yes, the multitude of storage engines has always been kind of a nightmare with regard to what syntax each and everyone accepts. I don’t like it.
I would put UNTIL on the same level as LIMIT. If you consider LIMIT an optimization, then so is UNTIL. I’m not sure how to classify them.
Shlomi Noach
25 Jan 10 at 5:56 am
[...] Schwartz inspires a lot of conversation with his post, My wishlist for SQL: the UNTIL clause. Baron says, “I’d like an UNTIL clause, please. I’d use it sort of like LIMIT in MySQL and [...]
Log Buffer #176: a Carnival of the Vanities for DBAs | The Pythian Blog
29 Jan 10 at 3:03 pm
To me, I don’t think UNTIL is a very good idea:
1. The semantics of a “result set” is basically thrown out the window. A query is supposed to ask for data as if it were a set immediately available at the time of the query. Otherwise, you’re just using *procedural code* (which UNTIL just seems like perverting query syntax for the sake of an implicit loop). In other words, what about the halting problem? If you forget a WHERE clause, the worst case is you retrieve too much data. If you don’t make UNTIL have a sane terminating condition, then you can create an infinite loop. Depending on the database engine, this can cause a death spiral of locks, memory leaks, etc. The *semantics* of a query IS NOT a loop, so continuing a loop UNTIL some condition is non-sensical. You ask for the data, and based on the predicates the data either gets returned or they don’t, as a single result.
2. The argument that subqueries are doing too much whereas “one query” of UNTIL is faster is a non-argument. There IS NO “one query” for UNTIL: it’s semantically a loop of multiple queries (like you’d do with a procedure anyway). At least subqueries can be optimized by the planner. How is the query planner supposed to plan for something when there may not even be a genuine terminating condition?
But maybe there are genuine reasons for this, maybe either I’m not understanding or the concept isn’t being explained clearly enough. But then again, maybe that’s a sign that it’s not a good idea (given the list of problems stated already)?
Rob W
8 Feb 10 at 7:48 pm
[...] Baron’s wishlist for SQL can also benefit from monotonic [...]
Monotonic functions, SQL and MySQL | code.openark.org
24 Feb 10 at 5:50 pm
I just stumbled across a problem where an UNTIL clause would probably indeed make sense:
I have a column containing strings, and I want to find those strings that are a prefix to a given string:
E.g. If the given string is ‘Hello World’ then rows containing ‘Hell’, and ‘Hello’ should match.
A query like
SELECT * FROM table t WHERE t.str <= 'Hello World' ORDER BY t.str DESC UNTIL ('Hello World' NOT_STARTS_WITH t.str)
would archive that.
(whatever SQL-function corresponds to the NOT_STARTS_WITH predicate)
So in my case I want to physically iterate over the b-tree index, until some condition is no longer satisfied.
Claus
3 Aug 10 at 11:30 am
Never mind, just realized that doesn’t work for my use-case, as I’m not having a linear order here, so I can’t stop scanning even a sorted table when a certain condition is met.
Claus
3 Aug 10 at 12:57 pm