I have a large file with 6 mile rows, and I'm trying to read the data in pieces for processing, so I did not fall into the RAM limit. Here is my code (note temp.csv is just a dummy file with 41 entries):
infile <- file("data/temp.csv", open="r") headers <- as.character(read.table(infile, header = FALSE, nrows=1, sep=",", stringsAsFactors=FALSE)) while(length(temp <-read.table(infile, header = FALSE, nrows=10, sep=",", stringsAsFactors=FALSE)) > 0){ temp <- data.table(temp) setnames(temp, colnames(temp), headers) setkey(temp, Id) print(temp[1, Tags]) } print("hi") close(infile)
Everything runs smoothly until the last iteration. I get this error message:
Error in read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) : no lines available in input In addition: Warning message: In read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) : incomplete final line found by readTableHeader on 'data/temp.csv'
Presumably this is because in the last iteration there is only 1 row of records and read.table expects 10?
All data is actually read in order. Surprisingly, even in the final iteration, temp
is still converted to data.table
. But print("hi")
and everything after it are never executed. Is there something I can do to get around this?
Thanks.