How to simulate optional parameters in SQL

This article explains how to write WHERE clauses that accept optional parameters, reducing the need for separate queries. I have seen a lot of SQL code where the SELECT is duplicated many times, but the WHERE clause is different in each one, and only one will execute. It’s easy to factor out the differences in the WHERE clause. For example, consider the following code, assuming two optional parameters to the query:

if @param1 is not null
    select * from table1 where col1 = @param1
else
    select * from table1 where col2 = @param2
This can be rewritten:
select * from table1
where (@param1 is null or col1 = @param1)
    and (@param2 is null or col2 = @param2)

Why it’s better this way

The duplication above may not seem like a big deal, but when the SELECT statement is 50 lines of code and there are 10 optional parameters, it’s a huge problem to have all that code duplicated:

  • it’s hard to test
  • the code is likely to get buggy when someone tries to change it
  • it’s hard to be sure the repeated code is exactly the same in all the places it’s duplicated

To the last point: even worse, it’s hard to be sure the code isn’t the same either. In other words, if I suspect there are differences in the SELECT, all the duplication makes it hard to find the differences, so the code becomes much harder to read, and I’m much less confident I haven’t missed something. Note: if the code isn’t really repeated, insert a comment so someone like me doesn’t come along and try to “fix” it!

How it works

Any decent query optimizer should be able to recognize when a clause is always true or always false, and either delete the clause or terminate the query with no results. For example, pretend I have the following query:

select * from t1
where (? is null or c1 = ?)
    and (? is null or c2 = ?)

Now suppose my parameters are 3 and NULL. What happens to the query?

select * from t1
where (3 is null or c1 = 3)
    and (NULL is null or c2 = NULL)

Any decent optimizer, including MySQL’s optimizer, will rewrite that query:

select * from t1
where (c1 = 3)

You can use EXPLAIN EXTENDED, followed by SHOW WARNINGS, to see this at work in MySQL.

Possible reasons not to do this

There is a potential downside (besides “fixers” like myself getting into the code). The query optimizer might not be able to optimize the combined statement as well as the separate ones, so if this is mission-critical or gets called all the time, measure its performance. Be especially watchful as regards indexes. If there is an index on one of the columns in the WHERE clause, the combined statement might prevent the query optimizer from using the index, especially as the query becomes more complex. The bottom line, as always, is to consider whether performance matters, and if it does, measure performance, analyze the query plan, and proceed accordingly.

Technorati Tags:No Tags

You might also like:

  1. How to write a SQL exclusion join
  2. SQL Server 2000 date and time puzzler
  3. How to write SQL JOIN clauses more compactly
  4. Why large IN clauses are problematic
  5. How to find duplicate rows with SQL

7 Responses to “How to simulate optional parameters in SQL”


  1. 1 Eric K.

    Another way is to use COALESCE:

    SELECT * FROM table1
    WHERE col1= COALESCE(@param1,col1)
    AND col2 = COALESCE(@param2,col2)

    COALESCE returns the first non-null value in the list, so if @param1 is supplied, col1 is compared to it. If not, col1 is compared to *itself* which always matches. The resulting behavior is that if a parameter is supplied, your code filters by it, otherwise it returns all rows.

  2. 2 Xaprb

    That is true. It has exactly the same logical result. But there are two important differences which may or may not matter:

    One, COALESCE() (or similar but not identical functions such as ISNULL() or IFNULL(), which I’ve seen used as well) still results in a potentially large value being compared, which might be optimized out with IS NULL. It’s also a function call, which has overhead.

    Two, because it’s a function call it won’t be optimized out in the optimizer phase, if applicable. When using an interface which sends the commands in plain text, the optimizer can completely remove parts of the query, as I mentioned above in MySQL. If you have any inclination to see that in action, it’s kind of fun to do.

    Theoretically, an optimizer could detect the query will be col1 = COALESCE(NULL, col1) and know that can be rewritten as col1 = col1, which can be optimized out altogether, but in practice, that’s a heck of a lot smarter and harder-to-write optimizer than I’d expect to find. I only have access to MySQL 5.x right now, but I expect other optimizers are no smarter than this:

    mysql> explain extended select col1 from tbl1 where col1 = coalesce(@param1, col1)G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: col1
             type: index
    possible_keys: NULL
              key: PRIMARY
          key_len: 4
              ref: NULL
             rows: 397
            Extra: Using where; Using index
    1 row in set, 1 warning (0.00 sec)
    
    mysql> show warningsG
    *************************** 1. row ***************************
      Level: Note
       Code: 1003
    Message: select sql_no_cache `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1` where (`test`.`tbl1`.`col1` = coalesce((@param1),`test`.`tbl1`.`col1`))
    1 row in set (0.01 sec)
    
    mysql> explain extended select col1 from tbl1 where col1 = coalesce(null, col1)G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: tbl1
             type: index
    possible_keys: NULL
              key: PRIMARY
          key_len: 4
              ref: NULL
             rows: 397
            Extra: Using where; Using index
    1 row in set, 1 warning (0.00 sec)
    
    mysql> show warningsG
    *************************** 1. row ***************************
      Level: Note
       Code: 1003
    Message: select `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1` where (`test`.`tbl1`.`col1` = coalesce(NULL,`test`.`tbl1`.`col1`))
    1 row in set (0.00 sec)

    In that query @param1 is undefined, and you can see it results in the same query plan as using NULL. MySQL does consider the current value of variables when generating an optimized query plan, as you can see here:

    mysql> explain extended select col1 from tbl1 where col1 = @param1G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: NULL
             type: NULL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: NULL
            Extra: Impossible WHERE noticed after reading const tables

    The function call is defeating the optimization. To optimize that out, the optimizer would have to know the behavior of every function, which isn’t its job. In general, function calls can defeat a lot of optimizations.

    On the other hand, as I said before:

    mysql> explain extended select col1 from tbl1 where (@param1 is null or col1 = @param1)G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: tbl1
             type: index
    possible_keys: NULL
              key: PRIMARY
          key_len: 4
              ref: NULL
             rows: 397
            Extra: Using index
    1 row in set, 1 warning (0.00 sec)
    
    mysql> show warningsG
    *************************** 1. row ***************************
      Level: Note
       Code: 1003
    Message: select sql_no_cache `test`.`tbl1`.`col1` AS `col1` from `test`.`tbl1`
    1 row in set (0.00 sec)

    The optimizer removed the WHERE clause completely.

    Stored procedures, and other situations where a query plan might be cached and re-used later, are always a factor to consider. In Microsoft SQL Server, for example, the first time a stored procedure is executed its query plan is cached for re-use (mostly… this can be defeated intentionally or with bad practices), which avoids the costs associated with generating a query plan. That can be a fairly significant optimization, as generating a query plan involves looking at index statistics, etc etc. But it also means whatever parameter is supplied for the first invocation of the stored procedure influences the cached query plan, so the query might be well-optimized for @param1 but not at all for @param2 — and it won’t get re-evaluated for @param2.

  3. 3 core

    Heya,

    Shouldn’t this:

    select * from table1
    where (col1 is null or col1 = @param1)
        and (col2 is null or col2 = @param2)

    read:

    select * from table1
    where (@param1 is null or col1 = @param1)
        and (@param2 is null or col2 = @param2)

    ? Cheers, Will

  4. 4 Xaprb

    DOH! You are right! I’ll amend the article.

  5. 5 Locusta

    Hello,

    Is there a performance penalty between these 2 statements:

    select * from table1
    where (@param1 is null or col1 = @param1)
        and (@param2 is null or col2 = @param2)
    select * from table1
    where (@param1 is null or col1 like @param1)
        and (@param2 is null or col2 like @param2)

    This would allow the user to key in e.g. SMIT% to search for a customer name starting with SMIT, but if he keys in SMITH, then only the exact name would appear in the result list.

    Is this good practice, or should I avoid?

    Thanks for this great article!

  6. 6 Xaprb

    There can be. If you enter SMIT% then the database server can do a prefix match, which can use indexes. If you enter %MIT% it becomes a full-length substring search, which is not indexable. So yes, there is always a potential for a bad query to result.

  1. 1 Frederick Lin : Optional Parameters in SQL

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the appropriate forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead.