So, if you have enough RAM, just use standard R-matrices. To transfer only part of each matrix to each cluster, use rdsfiles.
One colSums3-core calculation example :
CutBySize <- function(m, nb) {
int <- m / nb
upper <- round(1:nb * int)
lower <- c(1, upper[-nb] + 1)
size <- c(upper[1], diff(upper))
cbind(lower, upper, size)
}
seq2 <- function(lims) seq(lims[1], lims[2])
# The matrix
bm <- matrix(1, 10e3, 1e3)
ncores <- 3
intervals <- CutBySize(ncol(bm), ncores)
# Save each part in a different file
tmpfile <- tempfile()
for (ic in seq_len(ncores)) {
saveRDS(bm[, seq2(intervals[ic, ])],
paste0(tmpfile, ic, ".rds"))
}
cl <- parallel::makeCluster(ncores)
doParallel::registerDoParallel(cl)
library(foreach)
colsums <- foreach(ic = seq_len(ncores), .combine = 'c') %dopar% {
bm.part <- readRDS(paste0(tmpfile, ic, ".rds"))
colSums(bm.part)
}
parallel::stopCluster(cl)
all.equal(colsums, colSums(bm))
You can even use it rm(bm); gc()after burning parts to disk.
source
share