I have found a bug in Microsoft SQL Server’s replace() function. The function treats char and varchar differently, and replacements will fail in some cases. Here is a demonstration of the bug:
declare @string varchar(40),
@find char(2),
@replace char(1)
set @string = 'two spaces'
set @find = ' '
set @replace = ' '
if replace(@string, @find, @replace) = 'two spaces'
print 'Replacement worked'
else
print 'Replacement failed'
Result: “Replacement failed.” This is clearly incorrect behavior. To make it more interesting, here are some test cases that succeed:
- Change the type of
@findtovarchar(2). - Change the value of @string to
' '(two spaces). - Use
replicate(@replace, 2)instead of@find.
I found another discussion about this on Google groups, but it’s incoherent, doesn’t demonstrate the problem clearly, and speculates about the internal causes rather than suggesting ways to avoid the behavior (pointless). So I count that as “not posted elsewhere” and offer it here.
Technorati Tags:No Tags
I did a bit more testing on this and it appears to be a problem with the replace function trimming char variables. Changing the section
to
and outputting the replace exhibits the extent of the problem. The result is:
Furthermore, changing
@stringto'This is a test'and@find = 's 'with@replace = '*'results in:Note that the
's'in'test'has been changed although it has no spaces anywhere around it. This is the strongest evidence that trimming is going on.Leading spaces are not affected. The more complex
@stringwith@find = ' i'results in:as expected.
The replicate inside the replace works because replicate returns a varchar and so actually falls into the first case you mentioned which, as you noted, works correctly.
Finally, it is not a problem with char variables in general, because the following code honors the ANSI_PADDING setting while the replace issue does not:
In my work with extended procedures I have found that string constants appear to be passed as varchars and the string functions also return varchars, so unless you are reading the values from a table (in which case the variable should match the column definition) using varchars for string variables might be a good practice anyway by matching the source type (don’t even get me started on the implicit type conversions. Oh, for a strongly typed language…) At worst even if the constant string completely fills the varchar variable the extra two (or four) bytes of storage are only two bytes, not gigabytes as it would be in a billion row table.
The documentation for
spacestates the return type ischarbut the following still succeeds:if replace('two spaces', space(2), space(1)) = 'two spaces' print 'Replacement worked' else print 'Replacement failed'Apparently the return value of the function is either not as stated, or the input to the
replacefunction isn’t where it’s being trimmed.Microsoft’s Abdy Iman has responded to tell me someone is looking into the bug. They’ve confirmed it and that it is still present in SQL Server 2005.
I tripped over this bug too when trying to “unabbreviate” data in a view. The following failed because of the bug:
SELECT DISTINCT REPLACE(REPLACE(department, 'sch of ', 'School of '), 'sch ', 'School of ') AS deptExpanded, [position] FROM vWebDirectory ORDER BY deptExpanded, [Position]All instances of ’sch’ were replaced by ‘School of’, including within ‘School’ etc.
An ugly workaround was developed that may be useful in similar circumstances, see below;
SELECT DISTINCT CASE WHEN charindex('sch of ', department) 0 THEN replace(department, 'sch of ', 'School of ') WHEN charindex('sch ', department) 0 THEN replace(department, 'sch ', 'School of ') ELSE department END AS deptExpanded, [position] FROM vWebDirectory ORDER BY deptExpanded, [Position]This solved the problem for me…
select replace(fld, N’ ‘, N’ ‘)
from table
This would help when fld is an nvarchar instead of varchar.