VBScript regular expressions are slightly troublesome, though they certainly help turn VBScript into less of a joke when it comes to text processing. The syntax lacks some of the niceties of Perl or .NET regexes, but is complete enough to be very useful. This article shows you how to avoid potentially serious problems, and explains an undocumented feature.
Undocumented and incorrect behavior
- The documentation is incomplete. The RegExp object has an undocumented property,
Multiline, which affects pattern matching using the.metacharacter. This property’s default value isFalse, so.matches every character except a newline by default. WhenMultilineisTrue, the meaning of the.metacharacter is different; it then matches every character including a newline. This is the same behavior you will find in other languages, such as Perl and .NET. - Backslashed special characters do not work correctly inside brackets. For example, it ought to be possible to match across newlines with the patterns
[.\n]*and[.\s]*, but this prevents the pattern from matching anything at all, even when no newlines are involved.
How to avoid memory leaks
There is a memory-leak bug which has been reported elsewhere on the Internet. An expression with more than 10 subexpressions in Global mode can leak memory. To avoid this bug, don’t use a pattern with more than 10 subexpressions if the object’s Global property is set to True.
Documentation links
Here are two links to the Microsoft VBScript RegExp documentation on MSDN:
Technorati Tags:No Tags
I had an experience with VBscript where I was trying to eliminate trailing blank spaces and tabs from lines of text. VBscript would not match the regular expression pattern
[ \t]+$. Instead, I had to use something like([ \t]+)[\n\r]{1,2}and replace it withvbCrLf.Thank You for your reply, Potosino, it helped me look harder for a possible bug. I had the following experience:
Given the following two lines separated by a newline
test of r
what will come out?
I could not set the RegExp.Pattern to “r\nw”. It would match “r\r” (using just the CR) and it would match “\nw”. I finally tried them both together, and illogically it worked - “r\r\nw”.
I hate bugs! :)
chris
Hmm I was just playing with vbscript regexps a few days ago and the expression “r\r\n\w” is not a bug at all.
windows files use 2 chars as line delimiter. 1. a ‘carriage return’ (Cr). 2. a ‘line feed’ (Lf). that’s why the end of line delimiter is ‘vbCrLf — and not ‘vbLf’. and that’s why \r\n is a match.
\r matches the carriage return char, and \n matches the line feed ( or newline) char. Thus “r\r\n\w” matches your example string perfectly.
Dave
I tried to use that from VBA, and - of course - slammed into a bug.
Set reObj As New RegExp
With reObj
.Pattern = re
.MultiLine = True
End With
Set result = reObj.Execute(someString)
Set RE_Match_And_Capture = result(0).SubMatches
When my string is “hello” & vbCrLf & “world”, “^(.*)$” still just matches the “hello”. When I instead set MultiLine to False, the expression stops matching AT ALL! I then found that ^ and $ then correctly match just at beginning and end of the string, but that . still refuses to match newlines. I found that [^] is a working replacement for ., but still… this HAS to work. Or maybe I have just not seen some of the properties of the object? But the autocompletion SHOULD have showed me all…
In Perl, there is the multiline flag /…/m which seems to do exactly what MultiLine does (change the meaning of ^ and $ to be line-baseed), but what I really need is an equivalent of /…/s (changes the meaning of . to match everything, including newlines). Apparently, VBScript REs don’t support the “(?s)” way to set such flags either. So… how can I fix that problem?
Any ideas why the following two patterns do not both match the embedded numbers in the string?
Sub foo()
processPattern “\d “
processPattern “\d*”
End Sub
Sub processPattern(pat As String)
Dim re As VBScript_RegExp_55.RegExp
Set re = New VBScript_RegExp_55.RegExp
Dim matches As VBScript_RegExp_55.MatchCollection
re.Pattern = pat
Set matches = re.Execute(”asdf234asdf”)
Debug.Print pat & ” returns ‘” & matches(0) & “‘”
End Sub
First, regarding the debate about whether the way the VBScript Regular Expression engine handles the match expression “r\r\n\w” is a bug, that depends on your perspective. From the perspective of a person who learned them in the context of either Perl or a Unix gerp tool, this is incorrect. Even Win32 Perl behaves consistently, by treating the CR/LF pair as a single atom.
Second, this behavior appears to extend to the System.Text.Regex class of the Microsoft .NET Framework.
What Rubbish, they work perfectly for me….