Does bigmemory always use a backup file?

We are trying to use the BigMemory library with foreach for parallel analysis. However, the as.big.matrix function always uses the backingfile. There is enough memory on our workstations, is there a way to use bigMemory without a support file?

This code is x.big.desc <-describe(as.big.matrix(x))rather slow as it writes data to C:\ProgramData\boost_interprocess\. Somehow it's slower than saving x directly, is it as.big.matrix, which has slower I / O?

This code is x.big.desc <-describe(as.big.matrix(x, backingfile = ""))pretty fast, but it will also save a copy of the data in the% TMP% directory. We think this is fast because R starts the background recording process, rather than actually writing data. (We will see the write stream in the TaskManager after the request R is returned).

Is there a way to use BigMemory only with RAM so that every worker from the foreach cycle can access data through RAM?

Thanks for the help.

+4
source share
1 answer

So, if you have enough RAM, just use standard R-matrices. To transfer only part of each matrix to each cluster, use rdsfiles.

One colSums3-core calculation example :

# Functions for splitting
CutBySize <- function(m, nb) {
  int <- m / nb

  upper <- round(1:nb * int)
  lower <- c(1, upper[-nb] + 1)
  size <- c(upper[1], diff(upper))

  cbind(lower, upper, size)
}
seq2 <- function(lims) seq(lims[1], lims[2])

# The matrix
bm <- matrix(1, 10e3, 1e3)
ncores <- 3
intervals <- CutBySize(ncol(bm), ncores)
# Save each part in a different file
tmpfile <- tempfile()
for (ic in seq_len(ncores)) {
  saveRDS(bm[, seq2(intervals[ic, ])], 
          paste0(tmpfile, ic, ".rds"))
}
# Parallel computation with reading one part at the beginning
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
library(foreach)
colsums <- foreach(ic = seq_len(ncores), .combine = 'c') %dopar% {
  bm.part <- readRDS(paste0(tmpfile, ic, ".rds"))
  colSums(bm.part)
}
parallel::stopCluster(cl)
# Checking results
all.equal(colsums, colSums(bm))

You can even use it rm(bm); gc()after burning parts to disk.

0
source

Source: https://habr.com/ru/post/1683974/


All Articles