How to read some data from very large files?
Sample data is generated as:
set.seed(123)
df <- data.frame(replicate(10, sample(0:2000, 15 * 10^5, rep = TRUE)),
replicate(10, stringi::stri_rand_strings(1000, 5)))
head(df)
saveRDS
used to save the file.
saveRDS(df, 'df.rds')
The file size looks with the following commands:
file.info('df.rds')$size
utils:::format.object_size(29935125, "auto")
The saved file is read using the following function.
readRDS('df.rds')
However, some of my files are in GBs
and some processing will require several columns. Can I read selected columns from RDS
files?
Note. I already have RDS files created after significantly large amounts of processing. Now I want to find out the best way to read selected columns from existing RDS files.
source
share