How to avoid VBScript regular expression gotchas
VBScript regular expressions are slightly troublesome, though they certainly help turn VBScript into less of a joke when it comes to text processing. The syntax lacks some of the niceties of Perl or .NET regexes, but is complete enough to be very useful. This article shows you how to avoid potentially serious problems, and explains an undocumented feature.
Undocumented and incorrect behavior
- The documentation is incomplete. The RegExp object has an undocumented property,
Multiline, which affects pattern matching using the.metacharacter. This property’s default value isFalse, so.matches every character except a newline by default. WhenMultilineisTrue, the meaning of the.metacharacter is different; it then matches every character including a newline. This is the same behavior you will find in other languages, such as Perl and .NET. - Backslashed special characters do not work correctly inside brackets. For example, it ought to be possible to match across newlines with the patterns
[.\n]*and[.\s]*, but this prevents the pattern from matching anything at all, even when no newlines are involved.
How to avoid memory leaks
There is a memory-leak bug which has been reported elsewhere on the Internet. An expression with more than 10 subexpressions in Global mode can leak memory. To avoid this bug, don’t use a pattern with more than 10 subexpressions if the object’s Global property is set to True.
Documentation links
Here are two links to the Microsoft VBScript RegExp documentation on MSDN:
Further Reading:






I had an experience with VBscript where I was trying to eliminate trailing blank spaces and tabs from lines of text. VBscript would not match the regular expression pattern
[ \t]+$. Instead, I had to use something like([ \t]+)[\n\r]{1,2}and replace it withvbCrLf.Potosino
16 Aug 06 at 2:34 pm
Thank You for your reply, Potosino, it helped me look harder for a possible bug. I had the following experience:
Given the following two lines separated by a newline
test of r
what will come out?
I could not set the RegExp.Pattern to “r\nw”. It would match “r\r” (using just the CR) and it would match “\nw”. I finally tried them both together, and illogically it worked – “r\r\nw”.
I hate bugs! :)
chris
deneshac
14 Aug 07 at 4:33 pm
Hmm I was just playing with vbscript regexps a few days ago and the expression “r\r\n\w” is not a bug at all.
windows files use 2 chars as line delimiter. 1. a ‘carriage return’ (Cr). 2. a ‘line feed’ (Lf). that’s why the end of line delimiter is ‘vbCrLf — and not ‘vbLf’. and that’s why \r\n is a match.
\r matches the carriage return char, and \n matches the line feed ( or newline) char. Thus “r\r\n\w” matches your example string perfectly.
Dave
Dave M
21 Aug 07 at 11:11 pm
I tried to use that from VBA, and – of course – slammed into a bug.
Set reObj As New RegExp
With reObj
.Pattern = re
.MultiLine = True
End With
Set result = reObj.Execute(someString)
Set RE_Match_And_Capture = result(0).SubMatches
When my string is “hello” & vbCrLf & “world”, “^(.*)$” still just matches the “hello”. When I instead set MultiLine to False, the expression stops matching AT ALL! I then found that ^ and $ then correctly match just at beginning and end of the string, but that . still refuses to match newlines. I found that [^] is a working replacement for ., but still… this HAS to work. Or maybe I have just not seen some of the properties of the object? But the autocompletion SHOULD have showed me all…
In Perl, there is the multiline flag /…/m which seems to do exactly what MultiLine does (change the meaning of ^ and $ to be line-baseed), but what I really need is an equivalent of /…/s (changes the meaning of . to match everything, including newlines). Apparently, VBScript REs don’t support the “(?s)” way to set such flags either. So… how can I fix that problem?
divVerent
8 Sep 07 at 3:42 am
Any ideas why the following two patterns do not both match the embedded numbers in the string?
Sub foo()
processPattern “\d “
processPattern “\d*”
End Sub
Sub processPattern(pat As String)
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
re.Pattern = pat
Set matches = re.Execute(“asdf234asdf”)
Debug.Print pat & ” returns ‘” & matches(0) & “‘”
End Sub
Alok Saldanha
2 Nov 07 at 2:56 pm
First, regarding the debate about whether the way the VBScript Regular Expression engine handles the match expression “r\r\n\w†is a bug, that depends on your perspective. From the perspective of a person who learned them in the context of either Perl or a Unix gerp tool, this is incorrect. Even Win32 Perl behaves consistently, by treating the CR/LF pair as a single atom.
Second, this behavior appears to extend to the System.Text.Regex class of the Microsoft .NET Framework.
David Gray
24 Mar 08 at 9:39 pm
What Rubbish, they work perfectly for me….
Samantha Small
12 May 08 at 12:53 pm
re #2 from the article:
“Backslashed special characters do not work correctly inside brackets. For example, it ought to be possible to match across newlines with the patterns [.\n]* and [.\s]*, but this prevents the pattern from matching anything at all, even when no newlines are involved.”
I was able to get this to work by using the following pattern:
regEx.Pattern = “(.|[\r\n])*”
…also using a lazy quantifier:
regEx.Pattern = “(.|[\r\n])*?”
It’s not great but it works if you’re stuck with VBScript.
Scriptar
16 Dec 08 at 2:44 pm
Your observation follows from mine about the way the VBScript RegExp object treats newlines. While this behavior has the theoretical advantage that it allows you to treat carriage return (\r) and line feed (\n) characters separately, most of the time, such treatment is counterintuitive.
David Gray
29 Dec 09 at 10:50 pm
Yes, VBscript.RegExp pattern matching is very broken and doesn’t follow the POSIX standard at all for the “.” any-character.
I just spent an hour slamming my head against a wall trying to parse a multiline string that is the body of an e-mail. Until I realized that “.*” does not work for matching new lines, with .MultiLine set or unset. Using the character set “[.\n]*” did not work either. Only the wrong solution of using alternation “(.|\n)*” worked.
POSIX | VBScript
“.*” ~= “(.|\n)*”
JakFrost
20 Oct 10 at 8:48 pm
“When Multiline is True, the meaning of the . metacharacter is different; it then matches every character including a newline.”
I don’t think this is right, it can be tested using the following code:
str = “hello” & vbCrLf & “world”
Set re = New RegExp
re.MultiLine = True
re.Pattern = “.+”
Set ms = re.Execute(str)
WScript.Echo ms(0) = “hello” & vbCr
‘echo -1 whether MultiLine property is set to True or not.
Demon
27 Dec 11 at 2:54 am