In R, how to read a file with a custom line ending (eol)

I have a text file to read in R (and save to data.frame). The file is organized into several rows and columns. Both "sep" and "eol" are configured.

Problem: custom eol, i.e. "\ t & nd" (without quotes) cannot be set to read.table (...) (or read.csv (...), read.csv2 (...), ...) nor in fread (...) and I cannot find a solution.

I have a search here ("[r] read eol" and something else that I don’t remember), and I didn’t find a solution: the only thing that was for preprocessing the file modifying eol (in my case this is impossible) because in some fields I can find something like \ n, \ r, \ n \ r, ", ... and this is the reason for setting).

Thank!

+4
source share
1 answer

You can approach this in two different ways:

and. If the file is not too wide, you can read the necessary lines with scanand divide it into the necessary columns with strsplit, and then combine into data.frame. Example:

# Provide reproducible example of the file ("raw.txt" here) you are starting with
your_text <- "a~b~c!1~2~meh!4~5~wow"
write(your_text,"raw.txt"); rm(your_text)  

eol_str = "!" # whatever character(s) the rows divide on
sep_str = "~" # whatever character(s) the columns divide on

# read and parse the text file   
# scan gives you an array of row strings (one string per row)
# sapply strsplit gives you a list of row arrays (as many elements per row as columns)
f <- file("raw.txt")
row_list <- sapply(scan("raw.txt", what=character(), sep=eol_str), 
                   strsplit, split=sep_str) 
close(f)

df <- data.frame(do.call(rbind,row_list[2:length(row_list)]))
row.names(df) <- NULL
names(df) <- row_list[[1]]

df
#   a b   c
# 1 1 2 meh
# 2 4 5 wow

. A , @BondedDust, , , , R system() / read.table. . : https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands. , \n \r\n, - , , - , data.frame.

0

Source: https://habr.com/ru/post/1584071/


All Articles