Here is my basic understanding of what is happening.
First, some coding facts:
Encoding character UTF-8 CP1252 v 76 76 æ c3 a6 e6 g 67 67 t 74 74 Ã c3 83 c3 ¦ c2 a6 a6
Now the mechanics:
The Windows machine uses CP1252 , as seen from the output of sessionInfo . Thus, the string vægt in the R script is represented as bytes 76 e6 67 74 . This is confirmed by charToRaw("vægt") . If we then convert it to UTF-8, we get 76 c3 a6 67 74 . The fact that these bytes represent UTF-8 is lost. rawToChar() later converts these bytes back to a string, again accepting CP1252. Since c3 à and a6 are ¦ in CP1252, we get vægt .
On Mac and Linux, on the other hand, the default encoding is UTF-8, and there are no inconsistencies in the encoding. I suspect, however, that the same phenomenon as in Windows may be caused by an explicit change / setting of the encoding used by R.
I do not think this is a mistake.
source share