How to avoid VBScript regular expression gotchas

VBScript regular expressions are slightly troublesome, though they certainly help turn VBScript into less of a joke when it comes to text processing. The syntax lacks some of the niceties of Perl or .NET regexes, but is complete enough to be very useful. This article shows you how to avoid potentially serious problems, and explains an undocumented feature.

Undocumented and incorrect behavior

  1. The documentation is incomplete. The RegExp object has an undocumented property, Multiline, which affects pattern matching using the . metacharacter. This property’s default value is False, so . matches every character except a newline by default. When Multiline is True, the meaning of the . metacharacter is different; it then matches every character including a newline. This is the same behavior you will find in other languages, such as Perl and .NET.
  2. Backslashed special characters do not work correctly inside brackets. For example, it ought to be possible to match across newlines with the patterns [.\n]* and [.\s]*, but this prevents the pattern from matching anything at all, even when no newlines are involved.

How to avoid memory leaks

There is a memory-leak bug which has been reported elsewhere on the Internet. An expression with more than 10 subexpressions in Global mode can leak memory. To avoid this bug, don’t use a pattern with more than 10 subexpressions if the object’s Global property is set to True.

Documentation links

Here are two links to the Microsoft VBScript RegExp documentation on MSDN:

  1. VBScript Regular Expression Syntax
  2. The VBScript RegExp object
Technorati Tags:No Tags

You might also like:

  1. Browser variations in RegExp.exec()
  2. Bash parameter expansion cheatsheet
  3. JavaScript regular expression toolkit
  4. How to use the Visual SourceSafe automation interface
  5. How to create input masks in HTML

7 Responses to “How to avoid VBScript regular expression gotchas”


  1. 1 Potosino

    I had an experience with VBscript where I was trying to eliminate trailing blank spaces and tabs from lines of text. VBscript would not match the regular expression pattern [ \t]+$. Instead, I had to use something like ([ \t]+)[\n\r]{1,2} and replace it with vbCrLf.

  2. 2 deneshac

    Thank You for your reply, Potosino, it helped me look harder for a possible bug. I had the following experience:

    Given the following two lines separated by a newline

    test of r
    what will come out?

    I could not set the RegExp.Pattern to “r\nw”. It would match “r\r” (using just the CR) and it would match “\nw”. I finally tried them both together, and illogically it worked - “r\r\nw”.

    I hate bugs! :)

    chris

  3. 3 Dave M

    Hmm I was just playing with vbscript regexps a few days ago and the expression “r\r\n\w” is not a bug at all.

    windows files use 2 chars as line delimiter. 1. a ‘carriage return’ (Cr). 2. a ‘line feed’ (Lf). that’s why the end of line delimiter is ‘vbCrLf — and not ‘vbLf’. and that’s why \r\n is a match.

    \r matches the carriage return char, and \n matches the line feed ( or newline) char. Thus “r\r\n\w” matches your example string perfectly.

    Dave

  4. 4 divVerent

    I tried to use that from VBA, and - of course - slammed into a bug.

    Set reObj As New RegExp
    With reObj
    .Pattern = re
    .MultiLine = True
    End With
    Set result = reObj.Execute(someString)
    Set RE_Match_And_Capture = result(0).SubMatches

    When my string is “hello” & vbCrLf & “world”, “^(.*)$” still just matches the “hello”. When I instead set MultiLine to False, the expression stops matching AT ALL! I then found that ^ and $ then correctly match just at beginning and end of the string, but that . still refuses to match newlines. I found that [^] is a working replacement for ., but still… this HAS to work. Or maybe I have just not seen some of the properties of the object? But the autocompletion SHOULD have showed me all…

    In Perl, there is the multiline flag /…/m which seems to do exactly what MultiLine does (change the meaning of ^ and $ to be line-baseed), but what I really need is an equivalent of /…/s (changes the meaning of . to match everything, including newlines). Apparently, VBScript REs don’t support the “(?s)” way to set such flags either. So… how can I fix that problem?

  5. 5 Alok Saldanha

    Any ideas why the following two patterns do not both match the embedded numbers in the string?

    Sub foo()
    processPattern “\d “
    processPattern “\d*”
    End Sub

    Sub processPattern(pat As String)
    Dim re As VBScript_RegExp_55.RegExp
    Set re = New VBScript_RegExp_55.RegExp
    Dim matches As VBScript_RegExp_55.MatchCollection
    re.Pattern = pat
    Set matches = re.Execute(”asdf234asdf”)
    Debug.Print pat & ” returns ‘” & matches(0) & “‘”
    End Sub

  6. 6 David Gray

    First, regarding the debate about whether the way the VBScript Regular Expression engine handles the match expression “r\r\n\w” is a bug, that depends on your perspective. From the perspective of a person who learned them in the context of either Perl or a Unix gerp tool, this is incorrect. Even Win32 Perl behaves consistently, by treating the CR/LF pair as a single atom.

    Second, this behavior appears to extend to the System.Text.Regex class of the Microsoft .NET Framework.

  7. 7 Samantha Small

    What Rubbish, they work perfectly for me….

Leave a Reply

Please do not use this blog to get help with problems or bugs in Maatkit or innotop: use the appropriate forums, mailing list, or bug trackers. If you're asking for help with MySQL, please use the MySQL mailing list instead.