Writing to global variables using doSNOW and doing parallelization in R?

Is there a problem accessing / writing to a global variable when using the doSNOW package on multiple cores?

In the program below, each of MyCalculations (ii) writes to the 2nd column of the globalVariable matrix ...

Do you think the result will be correct? Will there be hidden catches?

Thanks a lot!

ps I have to write a global variable because this is a simplified example, in fact I have a lot of output that needs to be transferred from parallel loops ... so probably the only way is to write global variables ...

library(doSNOW) MaxSearchSpace=44*5 globalVariable=matrix(0, 10000, MaxSearchSpace) cl<-makeCluster(7) registerDoSNOW(cl) foreach (ii = 2:nMaxSearchSpace, .combine=cbind, .verbose=F) %dopar% { MyCalculations(ii) } stopCluster(cl) 

ps I ask - under DoSnow, is there any danger of accessing / writing global variables ... thanks

+6
source share
1 answer

Since this question a couple of months ago, I hope you find the answer. However, if you are still interested in feedback, here are some things to keep in mind:

When using foreach with a parallel backend, you will not be able to assign variables in the global R environment in the form in which you are trying (you probably noticed this). Using a serial backend, the assignment will work, but not use parallel, as with doSNOW .

Instead, save all the results of your calculations for each iteration in the list and return this to the object so that you can retrieve the corresponding results after all the calculations have been completed.

My suggestion starts similarly to your example:

 library(doSNOW) MaxSearchSpace <- 44*5 cl <- makeCluster(parallel::detectCores()) # do not create the globalVariable object registerDoSNOW(cl) # Save the results of the `foreach` iterations as # lists of lists in an object (`theRes`) theRes <- foreach (ii = 2:MaxSearchSpace, .verbose=F) %dopar% { # do some calculations theNorms <- rnorm(10000) thePois <- rpois(10000, 2) # store the results in a list list(theNorms, thePois) } 

After completing all iterations, extract the results from theRes and save them as objects (e.g. globalVariable , globalVariable2 , etc.)

 globalVariable1 <- do.call(cbind, lapply(theRes, "[[", 1)) globalVariable2 <- do.call(cbind, lapply(theRes, "[[", 2)) 

With this in mind, if you perform calculations with each iteration, which depend on the results of calculations from previous iterations, then this type of parallel computing is not suitable for adoption.

+7
source

Source: https://habr.com/ru/post/909134/


All Articles