Full Hard Disk Raster Pack

I process raster time series (modis ndvi images) to calculate the mean and st.deviation series. Each annual series consists of 23 ndvi.tif images, each of 508Mb, so the total amount is 11Gb for processing. Below is a script for one year. I have to repeat this for several years.

library(raster) library("rgeos") filesndvi <- list.files(, pattern="NDVI.tif",full.names=TRUE) filesetndvi10 <- stack(filesndvi) names(filesetndvi10) avgndvi10<-mean(filesetndvi10) desviondvi10 <- filesetndvi10 - avgndvi10 sumdesvioc <-sum(desviondvi10^2) varndvi10 <- sumdesvioc/nlayers(filesetndvi10) sdndvi10 <- sqrt(varndvi10) cvndvi10 <- sdndvi10/avgndvi10 

Problem: The process is written neatly to the hard drive until it is completely full. I donโ€™t know where the process is recorded in HD. The only way to clean up HD that I found is to reboot. I tried rm, it did not work. I tried to close RStudio, it did not work. I am using R 3.0.2 with RStudio 0.98.994 with Ubuntu 14.04 on 4Gb RAM Asus UX31 with 256 GB HD. Any thoughts on cleaning HD after calculating each year without rebooting would be very welcome. Thanks

+5
source share
4 answers

Two more things to consider. First, make fewer intermediate files by combining steps in calculations or overlay functions (there are not too many possibilities for this here, but there are some). It can also speed up calculations, as there will be less read and write to disk. Secondly, take control of deleting certain files. In the calc and overlay functions, you can specify file names to delete files that you no longer need. But you can also delete temp files explicitly. Of course, itโ€™s good practice to remove objects that point to these files first. Here is an example based on yours.

 library(raster) # example data set.seed(0) ndvi <- raster(nc=10, nr=10) n1 <- setValues(ndvi, runif(100) * 2 - 1) n2 <- setValues(ndvi, runif(100) * 2 - 1) n3 <- setValues(ndvi, runif(100) * 2 - 1) n4 <- setValues(ndvi, runif(100) * 2 - 1) filesetndvi10 <- stack(n1, n2, n3, n4) nl <- nlayers(filesetndvi10) avgndvi10 <- mean(filesetndvi10) desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , filename='over_tmp.grd') sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd') cvndvi10 <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE) f <- filename(avgndvi10) rm(avgndvi10, desviondvi10_2, sdndvi10) file.remove(c(f, extension(f, '.gri'))) file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri')) 

To find out where temp files are stored to view

 rasterOptions() 

or get the path as a do variable:

 dirname(rasterTmpFile()) 

To specify the path, use

 rasterOptions(tempdir='a path') 
+6
source

I struggle with the same, but I have some tricks that help. The first one is off, more memory is obtained. Frame and HD space are cheap and will have dramatic effects when working with large R objects such as rasters. Secondly , use removeTmpFiles() in the raster package. You can install ti files to remove tmp in excess of a certain number of hours. for example removeTmpFiles(0.5) will remove tmp files older than 30 minutes. Make sure that you set this only for the time when the files will be called for longer. Third , use something like bottom snip rasterOptions() . Be careful when setting the size of memory blocks; they will NOT work for your system, but you may find something more optimized than the default settings. Finally , use rm() and gc() to clean when cooking. Hope this helps, but if you find a better solution, let me know.

 tmpdir_name <- paste(c(drive, ":/RASTER_TEMP/"), collapse='') if(file.exists(tmpdir_name) == FALSE){ dir.create(tmpdir_name) } rasterOptions(datatype = "FLT4S", progress = "text", tmpdir = tmpdir_name, tmptime = 4, timer = TRUE, tolerance = 0.5, chunksize = 1e+08, maxmemory = 1e+09) 
+2
source

I found another way to deal with this problem, which was better for me based on this answer . In my case, I use a parallel loop and do not want to delete all files from the temporary directory because it can delete temporary files from other processes.

@RobertH's answer, which assumes the name of each individual temporary file name, is good, but I was not sure that it manually forces the raster to write even small files to the hard drive instead of using RAM and slow down the process (the raster documentation says that it writes only to disk if the file does not fit into RAM).

So, I created a temporary directory from a loop or a parallel process, bound to a unique name from the data that is processed in the loop, in my case the value is single@data $OWNER :

 #creates unique filepath for temp directory dir.create (file.path("c:/", single@data $OWNER), showWarnings = FALSE) #sets temp directory rasterOptions(tmpdir=file.path("c:/", single@data $OWNER)) 

Paste your processing code here, and then at the end of the loop delete the entire folder:

 #removes entire temp directory without affecting other running processes unlink(file.path("c:/", single@data $OWNER), recursive = TRUE) 
+2
source

I noticed that in RobertH's helpful answer, the last suggested command has an extra "e". It should be rasterOptions (tmpdir = 'path)

instead of rasterOptions (tempdir = 'path)

0
source

Source: https://habr.com/ru/post/1200835/


All Articles