Manipulating files with non-English names in R

When using the R functions to manage files in Windows, for example. dir() , those with non-English characters, like Cyrillic, are represented as a sequence of "?".

Similarly, when using file.rename() , if the new name contains non-English characters, the file is renamed as unreadable characters, apparently matching them with a different encoding.

There are many functions related to encoding the contents of a file, but how can we handle file names?

To reproduce the problem:
Outside of R, create a “hi .txt” file in the working directory; then in R:

 dir() # [1] "??????.txt" # ... 

Please note that the setting:

 Sys.setlocale(category = "LC_ALL", locale="Russian") 

Does not help.

Note. I use R 3.1.2 for Windows under Windows 8.1 in English and in Windows consoles ( cmd.exe ) I correctly see Cyrillic names.

+6
source share
2 answers

try the following: iconv(".txt","UTF-8","CP1251")

Character conversion between encodings:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/iconv.html

Iconv library:
http://www.delorie.com/gnu/docs/recode/recode_30.html

+2
source

One simple solution is to change the location if you want to run the script once or twice and know the target language.

 Sys.setlocale(category = "LC_ALL", locale="Russian") x1<-read.table("C:\\.txt",head=TRUE) #work just fine with R_3.1.2 Sys.setlocale(category = "LC_ALL", locale="English") x2<-read.table("C:\\.txt",head=TRUE) #will present error 

If you want to read from the server, I highly recommend that you use Python or another script language to handle the Unicode path. If you insist, I would say: (cf Set the locale for the default system to UTF-8 )

 Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252") x3<-read.table("C:\\.txt",head=TRUE) #will present warning or not, but successfully read a table into x3 

However, you should still process the contents of this table with some package (for example, stringi) and remember to return the location after this read operation, if necessary.

== Update ==

(cf https://stat.ethz.ch/pipermail/r-help/2011-May/278206.html ) This can also be an inconsistent issue according to the R-FAQ:

3.6 I do not see characters with accents on the R console, for example, in the text.

You need to specify a font in Rconsole (see Q5.2) that supports encoding in use. This used to be a problem in earlier versions of Windows, but now it's hard to find a font that doesn't.

The support for these characters in Rterm depends on the environment (the terminal window and the shell, including the locale and code page) in which it runs, as well as the font used by the terminal window. Usually it depends on outdated DOS settings and is changed.

Taking this, please tell me if you can enter Russian file names in the R-console using "read". Thanks.

+2
source

Source: https://habr.com/ru/post/971183/


All Articles