Mapping UTF-8 Chinese Characters to R

I am trying to open a UTF-8 encoded CSV file that contains (traditional) Chinese characters in R. For some reason, R displays information sometimes as Chinese characters, sometimes as Unicode characters.

For instance:

data <-read.csv("mydata.csv", encoding="UTF-8") data 

will output Unicode characters, and:

 data <-read.csv("mydata.csv", encoding="UTF-8") data[,1] 

will display chinese characters.

If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (View (data) or fix (data) command), it will reappear in unicode.

I asked for advice from people who use a Mac (I use a PC, Windows 7), and some of them got Chinese characters, others didn't. I tried saving the original data as a table instead and reading it in R that way - the same result. I tried to run the script in RStudio, Revolution R and RGui. I tried to configure the locale (for example, in Chinese), but either R did not allow me to change it, or the result was gibberish instead of Unicode characters.

My current language:

"LC_COLLATE = French_Switzerland.1252; LC_CTYPE = French_Switzerland.1252; LC_MONETARY = French_Switzerland.1252; LC_NUMERIC = C; LC_TIME = French_Switzerland.1252"

Any help to get R to constantly display Chinese characters would be greatly appreciated ...

+6
source share
2 answers

Not a mistake, more misunderstanding of the system transformations of the base type (type character and type factor ) when building data.frame .

First you can start with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) , because of which your Chinese characters will be of type character , and therefore, by printing them, you should see what you expect.

@nograpes: similar to x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be fine.

+2
source

In my case, utf-8 encoding does not work in my r. But Gb * encoding works. But utf8 wroks in ubuntu. First you need to find out the default encoding in your OS. And encode it as it is. Excel cannot encode it as utf8 properly, even if it claims to save as etf8.

(1) Download the 'open sheet'.

(2) Open it correctly. You can scroll through the encoding method until you see the Chinese character displayed in the preview windows.

(3) Save it as utf-8 (if you want utf-8). (UTF-8 is not a solution to every problem, you HAVE to know the default encoding on your system)

+1
source

Source: https://habr.com/ru/post/917659/


All Articles