It was an exciting debugging experience. Can you tell the difference between two lines?
StringReplace["–", RegularExpression@ "[\\s\\S]" -> "abc"] StringReplace["-", RegularExpression@ "[\\s\\S]" -> "abc"]
They are very different things when you evaluate them. It turns out because the line replaced in the first line consists of unicode en dash, unlike the plain old ascii line in the second line.
In the case of a Unicode string, the regular expression does not match. I meant the regular expression "[\ s \ S]" means "match any character (including newline)", but Mathematica seems to treat it as "matching any ascii character".
How can I fix the regex so that the first line above evaluates the same as the second? Also, is there an asciify filter that I can apply to strings first?
PS: Mathematica's documentation says that string pattern matching is built on top of the Perl-compatible regular expression library ( http://pcre.org ), so the problem I'm experiencing may not be specific to Mathematica.
source share