How to simulate optional parameters in SQL
This article explains how to write WHERE clauses that accept optional parameters, reducing the need for separate queries. I have seen a lot of SQL code where the SELECT is duplicated many times, but the WHERE clause is different in each one, and only one will execute. It’s easy to factor out the differences in the WHERE clause. For example, consider the following code, assuming two optional parameters to the query:
if @param1 is not null
select * from table1 where col1 = @param1
else
select * from table1 where col2 = @param2
This can be rewritten:
select * from table1
where (@param1 is null or col1 = @param1)
and (@param2 is null or col2 = @param2)
Why it’s better this way
The duplication above may not seem like a big deal, but when the SELECT statement is 50 lines of code and there are 10 optional parameters, it’s a huge problem to have all that code duplicated:
- it’s hard to test
- the code is likely to get buggy when someone tries to change it
- it’s hard to be sure the repeated code is exactly the same in all the places it’s duplicated
To the last point: even worse, it’s hard to be sure the code isn’t the same either. In other words, if I suspect there are differences in the SELECT, all the duplication makes it hard to find the differences, so the code becomes much harder to read, and I’m much less confident I haven’t missed something. Note: if the code isn’t really repeated, insert a comment so someone like me doesn’t come along and try to “fix” it!
How it works
Any decent query optimizer should be able to recognize when a clause is always true or always false, and either delete the clause or terminate the query with no results. For example, pretend I have the following query:
select * from t1
where (? is null or c1 = ?)
and (? is null or c2 = ?)
Now suppose my parameters are 3 and NULL. What happens to the query?
select * from t1
where (3 is null or c1 = 3)
and (NULL is null or c2 = NULL)
Any decent optimizer, including MySQL’s optimizer, will rewrite that query:
select * from t1 where (c1 = 3)
You can use EXPLAIN EXTENDED, followed by SHOW WARNINGS, to see this at work in MySQL.
Possible reasons not to do this
There is a potential downside (besides “fixers” like myself getting into the code). The query optimizer might not be able to optimize the combined statement as well as the separate ones, so if this is mission-critical or gets called all the time, measure its performance. Be especially watchful as regards indexes. If there is an index on one of the columns in the WHERE clause, the combined statement might prevent the query optimizer from using the index, especially as the query becomes more complex. The bottom line, as always, is to consider whether performance matters, and if it does, measure performance, analyze the query plan, and proceed accordingly.



Another way is to use
COALESCE:COALESCEreturns the first non-null value in the list, so if@param1is supplied,col1is compared to it. If not,col1is compared to *itself* which always matches. The resulting behavior is that if a parameter is supplied, your code filters by it, otherwise it returns all rows.Eric K.
11 Jul 06 at 10:36 am
That is true. It has exactly the same logical result. But there are two important differences which may or may not matter:
One,
COALESCE()(or similar but not identical functions such asISNULL()orIFNULL(), which I’ve seen used as well) still results in a potentially large value being compared, which might be optimized out withIS NULL. It’s also a function call, which has overhead.Two, because it’s a function call it won’t be optimized out in the optimizer phase, if applicable. When using an interface which sends the commands in plain text, the optimizer can completely remove parts of the query, as I mentioned above in MySQL. If you have any inclination to see that in action, it’s kind of fun to do.
Theoretically, an optimizer could detect the query will be
col1 = COALESCE(NULL, col1)and know that can be rewritten ascol1 = col1, which can be optimized out altogether, but in practice, that’s a heck of a lot smarter and harder-to-write optimizer than I’d expect to find. I only have access to MySQL 5.x right now, but I expect other optimizers are no smarter than this:mysql> explain extended select col1 from tbl1 where col1 = coalesce(@param1, col1)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: col1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 397 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select sql_no_cache `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1` where (`test`.`tbl1`.`col1` = coalesce((@param1),`test`.`tbl1`.`col1`)) 1 row in set (0.01 sec) mysql> explain extended select col1 from tbl1 where col1 = coalesce(null, col1)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 397 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1` where (`test`.`tbl1`.`col1` = coalesce(NULL,`test`.`tbl1`.`col1`)) 1 row in set (0.00 sec)In that query @param1 is undefined, and you can see it results in the same query plan as using NULL. MySQL does consider the current value of variables when generating an optimized query plan, as you can see here:
mysql> explain extended select col1 from tbl1 where col1 = @param1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Impossible WHERE noticed after reading const tablesThe function call is defeating the optimization. To optimize that out, the optimizer would have to know the behavior of every function, which isn’t its job. In general, function calls can defeat a lot of optimizations.
On the other hand, as I said before:
mysql> explain extended select col1 from tbl1 where (@param1 is null or col1 = @param1)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: tbl1 type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 397 Extra: Using index 1 row in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select sql_no_cache `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1` 1 row in set (0.00 sec)The optimizer removed the
WHEREclause completely.Stored procedures, and other situations where a query plan might be cached and re-used later, are always a factor to consider. In Microsoft SQL Server, for example, the first time a stored procedure is executed its query plan is cached for re-use (mostly… this can be defeated intentionally or with bad practices), which avoids the costs associated with generating a query plan. That can be a fairly significant optimization, as generating a query plan involves looking at index statistics, etc etc. But it also means whatever parameter is supplied for the first invocation of the stored procedure influences the cached query plan, so the query might be well-optimized for
@param1but not at all for@param2— and it won’t get re-evaluated for@param2.Xaprb
11 Jul 06 at 10:56 am
Heya,
Shouldn’t this:
select * from table1 where (col1 is null or col1 = @param1) and (col2 is null or col2 = @param2)read:
select * from table1 where (@param1 is null or col1 = @param1) and (@param2 is null or col2 = @param2)? Cheers, Will
core
15 Aug 06 at 6:49 am
DOH! You are right! I’ll amend the article.
Xaprb
15 Aug 06 at 7:13 am
Hello,
Is there a performance penalty between these 2 statements:
select * from table1 where (@param1 is null or col1 = @param1) and (@param2 is null or col2 = @param2)select * from table1 where (@param1 is null or col1 like @param1) and (@param2 is null or col2 like @param2)This would allow the user to key in e.g. SMIT% to search for a customer name starting with SMIT, but if he keys in SMITH, then only the exact name would appear in the result list.
Is this good practice, or should I avoid?
Thanks for this great article!
Locusta
29 Jan 07 at 12:51 pm
There can be. If you enter SMIT% then the database server can do a prefix match, which can use indexes. If you enter %MIT% it becomes a full-length substring search, which is not indexable. So yes, there is always a potential for a bad query to result.
Xaprb
29 Jan 07 at 12:57 pm
[...] certainly be using this more often. I found an article the other day explaining how to simulate optional parameters in SQL without duplicating a single SELECT statement several times, each with a different WHERE [...]
Frederick Lin : Optional Parameters in SQL
22 Oct 07 at 9:01 pm