The letter "y" appears after the "i" when sorting alphabetically

When using the sort(x) function, where x is a character, the letter "y" jumps to the middle, immediately after the letter "i":

 > letters [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" [21] "u" "v" "w" "x" "y" "z" > sort(letters) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [21] "t" "u" "v" "w" "x" "z" 

The reason may be that I am in Lithuania, and this is a β€œLithuanian” sorting of letters, but I need a normal sorting. How to change the sorting method back to normal inside R code?

I am using R 2.15.2 on Win7.

+45
alphabetical-sort r locale
Jan 22 '13 at 12:14
source share
2 answers

You need to change the locale where R is running. Either do this for your entire Windows installation (which seems suboptimal), or within R sessions using:

 Sys.setlocale("LC_COLLATE", "C") 

Instead of "C" you can use any other valid locale string, but this should return you to the sort order for the letters you want.

Read more ?locales .

I think it's worth noting the sister function Sys.getlocale() , which requests the current setting of the locale parameter. Therefore, you could do

 (locCol <- Sys.getlocale("LC_COLLATE")) Sys.setlocale("LC_COLLATE", "lt_LT") sort(letters) Sys.setlocale("LC_COLLATE", locCol) sort(letters) Sys.getlocale("LC_COLLATE") ## giving: > (locCol <- Sys.getlocale("LC_COLLATE")) [1] "en_GB.UTF-8" > Sys.setlocale("LC_COLLATE", "lt_LT") [1] "lt_LT" > sort(letters) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" [16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z" > Sys.setlocale("LC_COLLATE", locCol) [1] "en_GB.UTF-8" > sort(letters) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" [16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" > Sys.getlocale("LC_COLLATE") [1] "en_GB.UTF-8" 

which, of course, is what @Hadley Answer shows with_collate() , making it a bit more concise once you have installed devtools .

+39
Jan 22 '13 at 12:21
source share

If you want to do this temporarily, devtools provides a with_collate function:

 library(devtools) with_collate("C", sort(letters)) # [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" # [20] "t" "u" "v" "w" "x" "y" "z" with_collate("lt_LT", sort(letters)) # [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" # [20] "s" "t" "u" "v" "w" "x" "z" 
+34
Jan 22 '13 at 13:26
source share



All Articles