What is the best practice for performing functions in my R package?

I developed an R package containing inconveniently parallel functions.

I would like to implement parallelization for these functions in a way that is transparent to the user, regardless of his / her OS (at least ideally).

I looked at information on how other package authors imported foreach-based Parallelism . For example, the Max Kuhn package caretimports foreachfor use %dopar%, but relies on the user to specify a parallel backend. (A few examples use doMCone that does not work on Windows.)

Noting that doParallel works for Windows and Linux / OSX and uses the built-in package parallel(see comments here for a useful discussion), does it make sense to import doParalleland have my functions call registerDoParallel()whenever the user specifies parallel=TRUEas an argument?

+4
source share
2 answers

, . doParallel , , ? , makeCluster "outfile"? , .

getDoParRegistered, , , .

:

library(doParallel)
parfun <- function(n=10, parallel=FALSE,
                   cores=getOption('mc.cores', 2L)) {
  if (parallel) {
    # honor registration made by user, and only create and register
    # our own cluster object once
    if (! getDoParRegistered()) {
      cl <- makePSOCKcluster(cores)
      registerDoParallel(cl)
      message('Registered doParallel with ',
              cores, ' workers')
    } else {
      message('Using ', getDoParName(), ' with ',
              getDoParWorkers(), ' workers')
    }
    `%d%` <- `%dopar%`
  } else {
    message('Executing parfun sequentially')
    `%d%` <- `%do%`
  }

  foreach(i=seq_len(n), .combine='c') %d% {
    Sys.sleep(1)
    i
  }
}

, , parallel=TRUE, :

> parfun()
Executing parfun sequentially
 [1]  1  2  3  4  5  6  7  8  9 10

parallel=TRUE , :

> parfun(parallel=TRUE, cores=3)
Registered doParallel with 3 workers
 [1]  1  2  3  4  5  6  7  8  9 10

parfun parallel=TRUE , :

> parfun(parallel=TRUE)
Using doParallelSNOW with 3 workers
 [1]  1  2  3  4  5  6  7  8  9 10

-: . , , , , .


, , CRAN. detectCores(). , mclapply, , , .


stopCluster

, , stopCluster. , , foreach, . , , cl.

:

  • stopCluster parfun makePSOCKcluster;
  • , ( stopImplicitCluster doParallel);
  • .

, . .

+5

, . / API.

https://cran.r-project.org/package=future

, , , , , . plan(multiprocess), plan(cluster, workers = c("n1", "n3", "remote.server.org")) ..

HPC , Slurm, TORQUE/PBS SGE, future.BatchJobs, API BatchJobs, . plan(batchjobs_slurm). . ( future.batchtools batchtools)).

+1

Source: https://habr.com/ru/post/1670451/


All Articles