RegEx in Excel VBA Matching Extended ASCII Characters Wrong

I am trying to remove all non-printable and non-ASCII (extended) characters using the following RegEx in Excel VBA:

[^\x09\0A\0D\x20-\xFF]

This should theoretically correspond to everything that is not a tab, line feed, carriage, or printable ASCII character (character code between the hexadecimal values โ€‹โ€‹20 and FF or deck 32 and 255). I have confirmed here that Microsoft VBScript regular expressions support the \ xCC notation, where CC is ASCII code in hexadecimal format.

The problem is that this regular expression matches every character above 127. Then it throws an โ€œinvalid procedure callโ€ on match.value when the corresponding character code is above 127. Is it just that VBScript RegExes does not support character codes above 127? I can't seem to find this data anywhere. Here is the full code:

regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
    Set matches = regEx.Execute(Cells(curRow, curCol).Value)
    numReplacements = numReplacements + matches.Count
    For matchNum = matches.Count To 1 Step -1
        Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
        Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
    Next matchNum
    Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If

The first character that it matches is 0x96 (& ndash). I see this in the Clock window when I watch matches and expand it. However, when I try to watch match.Item (matchNum) .Value I get (see. Screenshot). Any ideas?

+4
source share
1 answer

Microsoft VBScript \xCC, CC - ASCII

, ASCII \x00 \x7F, ASCII \x20 \x7E.

\x80 - Ansi, ASCII.

:

Dim ii, sExPatern: sExPatern = "[^\x09\x0A\x0D\x20-\x7E\"
For ii = 128 To 255
  sExPatern = sExPatern & Chr( ii)
Next
sExPatern = sExPatern & "]"
'...
regEx.Pattern = sExPatern

, , . 129, 131, 136, 144, 152, 160 ( Ansi - " Windows", )

+1

Source: https://habr.com/ru/post/1544046/


All Articles