Parallel `for` loop with array as output

How can I run the for loop in parallel (so that I can use all the processors on my Windows machine), the result is a 3-dimensional array? The code that I have now takes about an hour to run and something like:

 guad = array(NA,c(1680,170,15)) for (r in 1:15) { name = paste("P:/......",r,".csv",sep="") pp = read.table(name,sep=",",header=T) #lots of stuff to calculate x (which is a matrix) guad[,,r]= x # } 

I looked at related issues and thought I could use foreach , but I could not find a way to combine the matrices into an array.

I am new to parallel programming, so any help would be greatly appreciated!

+4
source share
1 answer

You can do this with foreach using the abind function. Here is an example of using the doParallel package as a parallel backend, which is quite portable:

 library(doParallel) library(abind) cl <- makePSOCKcluster(3) registerDoParallel(cl) acomb <- function(...) abind(..., along=3) guad <- foreach(r=1:4, .combine='acomb', .multicombine=TRUE) %dopar% { x <- matrix(rnorm(16), 4) # compute x somehow x # return x as the task result } 

In this case, the join function acomb is used, which uses the abind function from the abind package to combine the matrices created by the cluster workers into a three-dimensional array.

In this case, you can also combine the results with cbind , and then change the dim attribute to convert the resulting matrix into a three-dimensional array:

 guad <- foreach(r=1:4, .combine='cbind') %dopar% { x <- matrix(rnorm(16), 4) # compute x somehow x # return x as the task result } dim(guad) <- c(4,4,4) 

Using abind is useful because it can combine matrices and arrays in various ways. Also remember that resetting the dim attribute can lead to duplication of the matrix, which can be a problem for large arrays.

Note that it’s nice to close the cluster at the end of the script with stopCluster(cl) .

+8
source

Source: https://habr.com/ru/post/1490707/


All Articles