I am using R 3.1.1 for Windows 7 32bits. I have many problems reading some text files on which I want to perform text analysis. According to Notepad ++, files are encoded using "UCS-2 Little Endian" . (grepWin, a tool whose name says everything, says the file is "Unicode.")
The problem is that I cannot read the file, even indicating that it is encoding. (Symbols have a standard Spanish Latin set - and should be easily handled with CP1252 or something like that.)
> Sys.getlocale() [1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252" > readLines("filename.txt") [1] "ÿþE" "" "" "" "" ... > readLines("filename.txt",encoding="UTF-8") [1] "\xff\xfeE" "" "" "" "" ... > readLines("filename.txt",encoding="UCS2LE") [1] "ÿþE" "" "" "" "" "" "" ... > readLines("filename.txt",encoding="UCS2") [1] "ÿþE" "" "" "" "" ...
Any ideas?
Thanks!!
edit: bypasses "UTF-16", "UTF-16LE" and "UTF-16BE" are not performed similarly
source share