The following is the MWE of my problem: I programmed a progress bar for some function using boot (via the boot function from the boot package).
This works fine until I use parallel processing ( res_1core
below). If I want to use parallel processing by setting parallel = "multicore"
and ncpus = 2
, the progress bar res_2core
not display correctly ( res_2core
below).
library(boot) rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) { env <- environment() counter <- 0 progbar <- txtProgressBar(min = 0, max = R, style = 3) bootfun <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data = d) curVal <- get("counter", envir = env) assign("counter", curVal + 1, envir = env) setTxtProgressBar(get("progbar", envir = env), curVal + 1) return(summary(fit)$r.square) } res <- boot(data = data, statistic = bootfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus) return(res) } res_1core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000) res_2core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000, parallel = "multicore", ncpus = 2)
I read that this is because the download function calls lapply
for single-core processing and mclapply
for multi-core processing. Does anyone know of a convenient workaround to handle this? I want to say that I would like to show progress taking into account all parallel processes.
Update
Thanks to the contribution of Karolis Koncevičius, I found a workaround (just use the updated rsq
function below):
rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) { bootfun <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data = d) return(summary(fit)$r.square) } env <- environment() counter <- 0 progbar <- txtProgressBar(min = 0, max = R, style = 3) flush.console() intfun <- function(formula, data, indices) { curVal <- get("counter", envir = env) + ncpus assign("counter", curVal, envir = env) setTxtProgressBar(get("progbar", envir = env), curVal) bootfun(formula, data, indices) } res <- boot(data = data, statistic = intfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus) return(res) }
Unfortunately, this only works for multi-core processing when I run R from the terminal. Any ideas on fixing this so that it also displays correctly in the R or Rstudio console?