I am trying to remove all non-printable and non-ASCII (extended) characters using the following RegEx in Excel VBA:
[^\x09\0A\0D\x20-\xFF]
This should theoretically correspond to everything that is not a tab, line feed, carriage, or printable ASCII character (character code between the hexadecimal values โโ20 and FF or deck 32 and 255). I have confirmed here that Microsoft VBScript regular expressions support the \ xCC notation, where CC is ASCII code in hexadecimal format.
The problem is that this regular expression matches every character above 127. Then it throws an โinvalid procedure callโ on match.value when the corresponding character code is above 127. Is it just that VBScript RegExes does not support character codes above 127? I can't seem to find this data anywhere. Here is the full code:
regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
Set matches = regEx.Execute(Cells(curRow, curCol).Value)
numReplacements = numReplacements + matches.Count
For matchNum = matches.Count To 1 Step -1
Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
Next matchNum
Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If
The first character that it matches is 0x96 (& ndash). I see this in the Clock window when I watch matches and expand it. However, when I try to watch match.Item (matchNum) .Value I get (see. Screenshot). Any ideas?
source
share