ParRF cutter not working for multiple cores

parRF from caret R does not work for me with more than one kernel, which is pretty ironic considering that par in parRF means parallel. I am on a Windows machine if this is relevant information. I have verified that I am using the latest best regarding carriage and doParallel.

I made a minimal example and give the results below. Any ideas?

Source

library(caret) library(doParallel) trCtrl <- trainControl( method = "repeatedcv" , number = 2 , repeats = 5 , allowParallel = TRUE ) # WORKS registerDoParallel(1) train(form = Species~., data=iris, trControl = trCtrl, method="parRF") closeAllConnections() # FAILS registerDoParallel(2) train(form = Species~., data=iris, trControl = trCtrl, method="parRF") closeAllConnections() 

Output

 > library(caret) > library(doParallel) > > trCtrl <- trainControl( + method = "repeatedcv" + , number = 2 + , repeats = 5 + , allowParallel = TRUE + ) > > > # WORKS > registerDoParallel(1) > train(form = Species~., data=iris, trControl = trCtrl, method="parRF") Parallel Random Forest 150 samples 4 predictors 3 classes: 'setosa', 'versicolor', 'virginica' ... some more model output, works fine! > closeAllConnections() > > # FAILS > registerDoParallel(2) > train(form = Species~., data=iris, trControl = trCtrl, method="parRF") Error in train.default(x, y, weights = w, ...) : final tuning parameters could not be determined In addition: Warning messages: 1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. 2: In train.default(x, y, weights = w, ...) : missing values found in aggregated results > closeAllConnections() 

Session Information

 > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 e1071_1.6-3 randomForest_4.6-7 caret_6.0-30 ggplot2_1.0.0 [8] lattice_0.20-29 loaded via a namespace (and not attached): [1] BradleyTerry2_1.0-4 brglm_0.5-9 car_2.0-20 class_7.3-10 codetools_0.2-8 colorspace_1.2-4 [7] compiler_3.1.0 digest_0.6.4 gnm_1.0-7 grid_3.1.0 gtable_0.1.2 gtools_3.4.1 [13] lme4_1.1-6 MASS_7.3-31 Matrix_1.1-3 minqa_1.2.3 munsell_0.4.2 nlme_3.1-117 [19] nnet_7.3-8 plyr_1.8.1 proto_0.3-10 qvcalc_0.8-8 Rcpp_0.11.2 RcppEigen_0.3.2.1.2 [25] relimp_1.0-3 reshape2_1.4 scales_0.2.4 splines_3.1.0 stringr_0.6.2 tcltk_3.1.0 [31] tools_3.1.0 

Update

  • Tried it with 3.1.1 (same package versions), same result.
  • Tried this with 3.0.2 and some old doParallel carriage version, it worked (see session information)

Session 2 Information:

 R version 3.0.2 (2013-09-25) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.6-1 class_7.3-9 randomForest_4.6-7 doParallel_1.0.6 iterators_1.0.6 [6] caret_5.17-7 reshape2_1.2.2 plyr_1.8 lattice_0.20-24 foreach_1.4.1 [11] cluster_1.14.4 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_3.0.2 grid_3.0.2 stringr_0.6.2 tools_3.0.2 
+6
source share
1 answer

This is clearly a bug in the 6.0-30 carriage that was introduced after version 5.17-7. This is also another problem that is likely to hit Windows users, as doParallel "mclapply" mode works, and "clusterApplyLB" mode fails.

I conducted several tests, and it seems that the problem is that the working clusters were not correctly initialized to perform nested parallel computing, so you can bypass the error by downloading the foreach package from the working cluster before calling "train". To do this, you need to explicitly create a cluster object, instead of allowing the registerDoParallel function to create it for you (which it does on Windows). For instance:

 cl <- makePSOCKcluster(2) clusterEvalQ(cl, library(foreach)) registerDoParallel(cl) 

I will contact Caret author to discuss a solution.

+9
source

Source: https://habr.com/ru/post/972335/


All Articles