I am familiar with foreach
, %dopar%
and the like. I am also familiar with the option parallel
for cv.glmnet
. But how do you set nested parallelism as shown below?
library(glmnet)
library(foreach)
library(parallel)
library(doSNOW)
Npar <- 1000
Nobs <- 200
Xdat <- matrix(rnorm(Nobs * Npar), ncol = Npar)
Xclass <- rep(1:2, each = Nobs/2)
Ydat <- rnorm(Nobs)
Parallel Cross Validation:
cl <- makeCluster(8, type = "SOCK")
registerDoSNOW(cl)
system.time(mods <- foreach(x = 1:2, .packages = "glmnet") %dopar% {
idx <- Xclass == x
cv.glmnet(Xdat[idx,], Ydat[idx], nfolds = 4, parallel = TRUE)
})
stopCluster(cl)
Unparallel Cross Validation:
cl <- makeCluster(8, type = "SOCK")
registerDoSNOW(cl)
system.time(mods <- foreach(x = 1:2, .packages = "glmnet") %dopar% {
idx <- Xclass == x
cv.glmnet(Xdat[idx,], Ydat[idx], nfolds = 4, parallel = FALSE)
})
stopCluster(cl)
For two system times, I get a very slight difference.
Is parallelism possible? Or do I need to explicitly use a nested statement?
Side question: if there are 8 cores in a cluster object and the cycle foreach
contains two tasks, will each core be given 1 core (and the remaining 6 kernels are idle) or will each core be given four kernels (using all 8 cores in total)? How can I request how many cores are currently in use?