I know it is freadrelatively new, but it really gives big performance improvements. I want to know if you can select rows and columns from the file you are reading? A bit like what it does read.csv.sql? I know, using the option select fread, you can select the columns to read, but what about reading only rows that meet certain criteria.
For example, could something like below be implemented with fread?
read.csv.sql(file, sql = "select V2,V4,V7,V8,V9, V10 from file where V5=='CE' and V10 >= 500",header = FALSE, sep= '|', eol ="\n")
If this is not yet possible, is it advisable to read the entire amount of data, and then use subset, etc., to get the final result? Or will it defeat the purpose of use fread?
For reference, I have to read about 800 files, each of which contains about 100,000 rows and 10 columns. Any input is welcome.
Thank.
source
share