How to remove unicode <U + 00A6> from a string?
4 answers
I just want to remove the unicode
<U+00A6>that is at the beginning of the line.
Then you do not need gsub, you can use a template subwith a template "^\\s*<U\\+\\w+>\\s*":
q <-"<U+00A6> 1000-66329"
sub("^\\s*<U\\+\\w+>\\s*", "", q)
Template Details :
^- beginning of line\\s*- zero or more spaces<U\\+- literal sequence char<U+\\w+- 1 or more letters, numbers or underscores>- literal>\\s*- .
- , |- gsub ( , - akrun):
trimws(gsub("^\\s*<U\\+\\w+>|-", " ", q))
+2
Instead of deleting, you should convert it to the appropriate format ... You should set your local UTF-8 as follows:
Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
You may see the following message:
Warning message:
In Sys.setlocale("LC_CTYPE", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
In this case you should use stringi::stri_trans_general(x, "zh")
Here, zh means Chinese. You need to know in which language you should convert. What he
0