Microsoft SQL Server’s
openxml function is highly CPU-intensive. This article demonstrates more efficient alternatives to
Why people use
A common usage for
openxml is to use XML to pass a “list” or “table” of values into a stored procedure. Here is a scenario I’ve seen a lot at my current job:
- An ASP accepts a string as input, such as “ipod car adapter”
- The ASP parses the input into space-separated values, then concatenates them together into XML
- The ASP passes the XML to a stored procedure
- The sproc uses
openxmlto populate a temporary table from the XML
- The sproc does some work (such as searching a product catalog)
This is fine, except
openxml can cause a huge CPU spike, which is bad news in a Web-facing search server during holiday shopping season.
I have rewritten such a search sproc to remove
openxml, and found it to account for upwards of three-fourths of the total cost, even in an otherwise very expensive sproc. I have anecdotal wisdom from other DBAs about people who over-use
openxml and end up with systems that run hot all the time.
A list of simple delimited values doesn’t need XML — it’s overkill. A better alternative is to pass the string directly into the sproc, and write a UDF to return a table with the elements of the string. It’s still string parsing, but it’s better than using XML.
Solution 1: a
WHILE loop in a user-defined function
Here is a user-defined function that will split a delimited set of words into a table and return the table:
CREATE function dbo.fn_SplitWords ( @Words varchar(8000), @Delim char(1)) returns @Words_table table(word varchar(255), ident int identity not null) as begin declare @Wordstart int, @WordEnd int, @DoubleDelim varchar(2) set @Wordstart = 1 set @WordEnd = 1 set @DoubleDelim = replicate(@Delim, 2) -- Prepare the data while charindex(@DoubleDelim, @Words) > 0 set @Words = replace(@Words, @DoubleDelim, @Delim) -- Left-trim it if left(@Words, 1) = @Delim begin if @Delim = ' ' set @Words = ltrim(@Words) else set @Words = substring( @Words, patindex('%[^' + @Delim + ']%', @Words), len(@Words) - patindex('%[^' + @Delim + ']%', @Words) + 1) end -- Right-pad it if right(@Words, 1) <> @Delim set @Words = @Words + @Delim while @Wordstart > 0 begin -- Extract the next word set @WordEnd = charindex(@Delim, @Words, @Wordstart) if @WordEnd > @WordStart begin insert into @Words_table select substring(@Words, @Wordstart, @WordEnd - @Wordstart) set @Wordstart = @WordEnd + 1 end else set @Wordstart = 0 -- Terminate the loop end return end
Please note the bug I’m avoiding in the
while loop above. I explain the SQL Server 2000
replace bug in another post.
Here are some test calls for the UDF:
select * from dbo..fn_SplitWords('this is a test call', ' ') select * from dbo..fn_SplitWords(' this is a test call', ' ') select * from dbo..fn_SplitWords('this is a test call ', ' ') select * from dbo..fn_SplitWords('this-is-a-test-call', '-') select * from dbo..fn_SplitWords('this is a test call', ' ') select * from dbo..fn_SplitWords(' ', ' ') select * from dbo..fn_SplitWords('', ' ')
After I wrote this, I saw someone else did the same thing elsewhere, though in my opinion very poorly implemented and explained. In any case, here’s a link for the sake of completeness: Treat Yourself to Fn_Split()
A more efficient approach
Instead of using a loop as I did above, it’s actually much more efficient to use an integers table and a
JOIN to parse the tokens apart. This approach is slightly less flexible, and doesn’t handle all the special cases I handled above with my UDF, such as tokens being separated by several delimiters instead of just one. Regardless, it is absolutely a better way to go, as long as the input is well-formed. It doesn’t use any nonstandard SQL, either — it’s a relational solution to the problem. Here are three resources where you can learn more about this extremely elegant technique:
Other advantages to a UDF
This approach has other advantages over
- it’s easy to string-ify a table for input to another sproc from within SQL (see my article about concatenating strings in SQL for more)
- it is much simpler, needs much less code, and is easier to understand and maintain
- the UDF can be used directly in a FROM clause; there is no need to create a temporary table or table variable unless you want to store and re-use the values
- you can use table variables instead of temporary tables, saving disk I/O and locks in the
- you can pass more data without running into upper limits on the size of varchar, because space-delimited is much less verbose than XML
I would be remiss if I didn’t mention the downsides:
- string-parsing is never efficient, and can be error-prone
- you will need to create and maintain UDFs (in my case, I need at least two — one for strings and one for integers)
- you have less flexibility about types and schemas; this technique is only convenient for simple cases
More about efficiency
SQL server seems to be smart enough to reuse resources within a query batch, so the high cost of using
openxml only seems to happen on the first invocation in a batch. When I benchmarked it with a thousand iterations, the string-parsing solution’s constant cost appeared to be about half the constant cost of
openxml — not a significant improvement. However, in the common case where it’s used only once, the string-parsing is much more efficient because there is no startup cost.
All in all, I think string-parsing is the lesser of the evils.