TxtProgressBar for a parallel bootstrap that does not display correctly

The following is the MWE of my problem: I programmed a progress bar for some function using boot (via the boot function from the boot package).

This works fine until I use parallel processing ( res_1core below). If I want to use parallel processing by setting parallel = "multicore" and ncpus = 2 , the progress bar res_2core not display correctly ( res_2core below).

 library(boot) rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) { env <- environment() counter <- 0 progbar <- txtProgressBar(min = 0, max = R, style = 3) bootfun <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data = d) curVal <- get("counter", envir = env) assign("counter", curVal + 1, envir = env) setTxtProgressBar(get("progbar", envir = env), curVal + 1) return(summary(fit)$r.square) } res <- boot(data = data, statistic = bootfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus) return(res) } res_1core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000) res_2core <- rsq(mpg ~ wt + disp, data = mtcars, R = 1000, parallel = "multicore", ncpus = 2) 

I read that this is because the download function calls lapply for single-core processing and mclapply for multi-core processing. Does anyone know of a convenient workaround to handle this? I want to say that I would like to show progress taking into account all parallel processes.

Update

Thanks to the contribution of Karolis Koncevičius, I found a workaround (just use the updated rsq function below):

 rsq <- function(formula, data, R, parallel = c("no", "multicore", "snow"), ncpus = 1) { bootfun <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data = d) return(summary(fit)$r.square) } env <- environment() counter <- 0 progbar <- txtProgressBar(min = 0, max = R, style = 3) flush.console() intfun <- function(formula, data, indices) { curVal <- get("counter", envir = env) + ncpus assign("counter", curVal, envir = env) setTxtProgressBar(get("progbar", envir = env), curVal) bootfun(formula, data, indices) } res <- boot(data = data, statistic = intfun, R = R, formula = formula, parallel = parallel, ncpus = ncpus) return(res) } 

Unfortunately, this only works for multi-core processing when I run R from the terminal. Any ideas on fixing this so that it also displays correctly in the R or Rstudio console?

+6
source share
1 answer

Not what you ordered, but may be useful.

Simple statistics function to download:

 library(boot) bootfun <- function(formula, data, indices) { d <- data[indices,] fit <- lm(formula, data=d) summary(fit)$r.square } 

Higher order function to display progress:

 progressReporter <- function(total, nBars=100, f, ...) { count <- 1 step <- ceiling(total/nBars) cat(paste(rep("|", nBars), collapse=""), "\r") flush.console() function(...) { if (count %% step==0) { cat(".") } count <<- count + 1 f(...) } } 

Now this function is deceiving - it shows progress at each "step" of iterations. If you have 1000 iterations, use two cores and print every 10th iteration - it will do the job. Cores do not share the state, but each of them will run a counter up to 500, and the function will respond to both counters.

On the other hand, if you perform 1000 iterations, run 10 cores and report every 200 - the function will remain silent, since all cores will be counted up to 100 each. None of them will reach 200 - there is no progress bar. Hope you get this idea. I think in most cases this should be good.

Try:

 res_1core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun)) res_2core <- boot(formula="mpg ~ wt + disp", data=mtcars, R=1000, statistic=progressReporter(1000, nBars=100, f=bootfun), parallel="multicore", ncpus=2) 
+5
source

Source: https://habr.com/ru/post/978401/


All Articles