I study how character encoding affects sorting. My question is here:
How can I change one column of a data frame to another character encoding?
In the context, I will add a few additional steps below.
1) Create a data frame:
d.enc <- data.frame( utf8 = c(" ", "_ ", " _"), mac = c(" ", "_ ", " _"), label = c("space", "underscore space", "space underscore") )
2) Convert to character vectors and try to set the encoding:
d.enc2$utf8 <- as.character(d.enc$utf8) d.enc2$mac <- as.character(d.enc$mac) d.enc2$label <- as.character(d.enc$label) Encoding(d.enc2$utf8) <- "UTF-8" Encoding(d.enc2$mac) <- "MACINTOSH" Encoding(d.enc2$utf8)
3) This is not what I was hoping for. I would expect:
# [1] "UTF-8" "UTF-8" "UTF-8" and # [1] "MACINTOSH" "MACINTOSH" "MACINTOSH"
4) Are my encodings supported? (Launch on Mac)
temp <- iconvlist() temp[399] # [1] "UTF-8" temp[338] # [1] "MACINTOSH"
They seem to be supported.
5) As soon as I can change the encodings, I would like to do the following to see how the sort order changes:
library(dplyr) arrange(d.enc2, desc(utf8)) arrange(d.enc2, desc(mac))
6) I expect the result to look something like this, but in a different order, depending on which column is used for sorting:
utf8 mac label 1 _ _ underscore space 2 _ _ space underscore 3 space
Thanks for any tips!