I find it wise to use detectCores as a starting point for the number of workers / processes when calling mclapply or makeCluster . However, there are many reasons why you may need or need to start fewer workers, and even in some cases when you can reasonably start more.
On some hypersurface machines, for example, it is not recommended to set mc.cores=detectCores() . Or, if your script runs on an HPC cluster, you should not use more resources than the job scheduler assigned to your work. You should also be careful in nested parallel situations, for example, when your code can be executed in parallel with a calling function or you are running a multi-threaded function in parallel. In general, it is a good idea to run some preliminary tests before starting a long job to determine the maximum number of workers. I usually track the benchmark with top to find out if the number of processes and threads makes sense, and to make sure memory usage is reasonable.
The tip you quote is especially suitable for package developers. Of course, the idea of โโa package developer is to always start with detectCores() when calling mclapply or makeCluster , so it's best to leave the solution to the end user. At the very least, the package should allow the user to specify the number of workers to run, but perhaps detectCores() is not even a good default value. Therefore, the default value of mc.cores for mc.cores changed from detectCores() to getOptions("mc.cores", 2L) when mclapply was included in the parallel package.
I think the real warning point that you quoted is that the R functions should not assume that they belong to the whole machine, or that they are the only function in your script that uses multiple cores. If you call mclapply with mc.cores=detectCores() in the package that you send to CRAN, I expect your package to be rejected until you change it. But if you are an end user, running a parallel script on your own machine, then you decide how many script cores are allowed to use.
source share