Upload new files to the directory

I have an R script to upload multiple text files to a directory and save data in a compressed .rda format. It looks like this:

#!/usr/bin/Rscript --vanilla args <- commandArgs(TRUE) ## arg[1] is the folder name outname <- paste(args[1], ".rda", sep="") files <- list.files(path=args[1], pattern=".txt", full=TRUE) tmp <- list() if(file.exists(outname)){ message("found ", outname) load(outname) tmp <- get(args[1]) # previously read stuff files <- setdiff(files, names(tmp)) } if(is.null(files)) message("no new files") else { ## read the files into a list of matrices results <- plyr::llply(files, read.table, .progress="text") names(results) <- files assign(args[1], c(tmp, results)) message("now saving... ", args[1]) save(list=args[1], file=outname) } message("all done!") 

The files are quite large (15 MB each, of which usually 50), so the launch of this script usually takes several minutes, a significant part of which is taken with the recording of .rda results.

I often update the directory with new data files, so I would like to add them to previously saved and compressed data. This is what I do above, checking if there is already an output file with this name. The final step is still pretty slow saving the .rda file.

Is there a smarter way to do this in some kind of package, keeping a trace of which files were read and saving it faster?

I saw that knitr uses tools:::makeLazyLoadDB to save its cached calculations, but this function is not documented, so I'm not sure where it makes sense to use it.

+6
source share
1 answer

For intermediate files that I need to read (or write) often, I use

 save (..., compress = FALSE) 

which greatly speeds up the work.

+6
source

Source: https://habr.com/ru/post/915074/


All Articles