What is the fastest way to get CSV output to a data frame?

I have a program that outputs strings of CSV data that I want to load into a data frame. I am currently loading data as follows:

tmpFilename <- "tmp_file" system(paste(procName, ">", tmpFilename), wait=TRUE) myData <- read.csv(tmpFilename) # (I also pass in colClasses and nrows for efficiency) 

However, I thought that redirecting the output to a file just for reading from it was inefficient (the program spills out about 30 MB, so I want to deal with it with optimal performance). I thought textConnection would solve this, so I tried:

 con <- textConnection(system(procName, intern=TRUE)) myData <- read.csv(con) 

This works much slower, and although the first solution decreases linearly with input size, the performance of the textConnection solution deteriorates exponentially. The slowest part creates a textConnection . read.csv actually ends here faster than in the first solution, as it reads from memory.

My question is, does a read.csv only read.csv on it read.csv my best option regarding speed? Is there a way to speed up the creation of a text join? bonus: why is creating textConnection so slow?

+6
source share
1 answer

The โ€œfastest wayโ€ probably involves using something other than read.csv. However, sticking with read.csv using a pipe might be a way:

 myData <- read.csv(pipe(procName)) 

It avoids reading the full text output into an intermediate buffer (at least until read.csv receives it).

Some time comparisons:

 > write.csv(data.frame(x=rnorm(1e5)), row.names=FALSE, file="norm.csv") > system.time(d <- read.csv("norm.csv")) user system elapsed 0.398 0.004 0.402 > system.time(d <- read.csv(textConnection(system("cat norm.csv", intern=TRUE)))) user system elapsed 56.159 0.106 56.095 > system.time(d <- read.csv(pipe("cat norm.csv"))) user system elapsed 0.475 0.012 0.531 
+3
source

Source: https://habr.com/ru/post/945218/


All Articles