R gsub with special characters

I tried to replace what I considered a standard dash using gsub . The code I tested was:

gsub("-", "ABC", "reported โ€“ estimate")

It does nothing. I copied and pasted a dash at http://unicodelookup.com/#-/1 , and it looks like a dash. This site provides hexadecimal, dec, etc. The codes are for en dash, and I'm trying to replace en dash, but I'm out of luck. Suggestions?

(As a bonus, if you can tell me if there is a function for defining special characters that would be useful).

I'm not sure if formatting the SO code will change the dash format, so here is the trait I use (-).

+5
source share
2 answers

You can replace en-dash by simply specifying it in the regex pattern.

 gsub("โ€“", "ABC", "reported โ€“ estimate") 

You can match all hyphens, en- and em dashes with

 gsub("[-โ€“โ€”]", "ABC", "reported โ€“ estimate โ€” more - text") 

Watch the IDEONE demo

To check if a string contains non-ascii characters, use

 > s = "plus รงa change, plus c'est la mรชme chose" > gsub("[[:ascii:]]+", "", s, perl=T) [1] "รงรช" 

Watch this IDEONE Demo

You will either get an empty result (if the string consists only of word characters and spaces), or - as here - some "special" characters.

+4
source

you can make a negative addition to replace a special character.

gsub('[^\\w]*', 'ABC', 'reported - estimate', perl = True) will replace all special characters with ABC. [^ \ W] is a template that says everything that is not a normal character.

+2
source

Source: https://habr.com/ru/post/1244233/


All Articles