R: Change the character encoding of the columns in the data frame

I study how character encoding affects sorting. My question is here:

How can I change one column of a data frame to another character encoding?

In the context, I will add a few additional steps below.

1) Create a data frame:

d.enc <- data.frame( utf8 = c(" ", "_ ", " _"), mac = c(" ", "_ ", " _"), label = c("space", "underscore space", "space underscore") ) 

2) Convert to character vectors and try to set the encoding:

 d.enc2$utf8 <- as.character(d.enc$utf8) d.enc2$mac <- as.character(d.enc$mac) d.enc2$label <- as.character(d.enc$label) Encoding(d.enc2$utf8) <- "UTF-8" Encoding(d.enc2$mac) <- "MACINTOSH" Encoding(d.enc2$utf8) # [1] "unknown" "unknown" "unknown" Encoding(d.enc2$mac) # [1] "unknown" "unknown" "unknown" 

3) This is not what I was hoping for. I would expect:

 # [1] "UTF-8" "UTF-8" "UTF-8" and # [1] "MACINTOSH" "MACINTOSH" "MACINTOSH" 

4) Are my encodings supported? (Launch on Mac)

 temp <- iconvlist() temp[399] # [1] "UTF-8" temp[338] # [1] "MACINTOSH" 

They seem to be supported.

5) As soon as I can change the encodings, I would like to do the following to see how the sort order changes:

 library(dplyr) arrange(d.enc2, desc(utf8)) arrange(d.enc2, desc(mac)) 

6) I expect the result to look something like this, but in a different order, depending on which column is used for sorting:

  utf8 mac label 1 _ _ underscore space 2 _ _ space underscore 3 space 

Thanks for any tips!

+5
source share

Source: https://habr.com/ru/post/1245013/


All Articles