Why do save and saveRDS act differently inside dopar?

(This is my first time trying to create a reproducible sample question - please feel free to comment to better describe or illustrate the problems!)

MAIN APPLICATION FOR ISSUES

I train ~ 25,000 models in parallel using foreach %dopar%and caretList(from the package caretEnsemble). Due to problems with R and memory problems, I need to save each of the forecasts as a separate object, so my workflow looks something like this: see below for a reproduced example.

cl <- makePSOCKcluster(4)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)

multiple.forecasts <- foreach(x=1:1,.combine='rbind',.packages=c('zoo','earth','caret',"glmnet","caretEnsemble")) %dopar% {
  tryCatch({
    results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
    for (i in 1:length(results)) {
      results[[i]]$trainingData <- c() ## should be trimming out trainingData
    }
    save(results,file="foreach_results.Rdata") ## export each caretList as its own object
    1
  },
  error = function(e) {
    write.csv(e$message,file="foreach_failure.txt") ## monitor failures as needed
    0
  }
  )
}

(IRL this project does not include data mtcars- each iteration of the loop foreachiterates over one of the data frames in the list and saves a new forecast object for each data frame.)

foreach, 136 Windows - .

, foreach, :

results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
    results[[i]]$trainingData <- c()
}
save(results,file="no_foreach_results.Rdata")

, , 156 Windows. , Windows?

foreach 4 , foreach 10 , , 25 000 .

  • foreach , , - ?

  • , save foreach : , , , saveRDS (. ), , .
  • Trim, , caretList: Trim trainControl, , , , trainingData.
  • save xz: foreach, , . 3-4 , .
  • PSOCK caret : . .
  • saveRDS .. saveRDS save, .
  • tryCatch . tryCatch foreach .

:

library(caret)
library(caretEnsemble)

## train a caretList without foreach loop
fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv",
  number = 10,
  ## repeated ten times
  repeats = 10,
  trim=TRUE)

results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
    results[[i]]$trainingData <- c()
}
object.size(results) ##returns about 546536 bytes
save(results,file="no_foreach_results.Rdata") ##in Windows, this object is about 136 KB

## train a caretList with foreach loop
library(doParallel)

cl <- makePSOCKcluster(4)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)

multiple.forecasts <- foreach(x=1:1,.combine='rbind',.packages=c('zoo','earth','caret',"glmnet","caretEnsemble")) %dopar% {
  tryCatch({
    results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
    for (i in 1:length(results)) {
      results[[i]]$trainingData <- c()
    }
    save(results,file="foreach_results.Rdata") ## in Windows, this object is about 160 KB
    ## loading this file back in and running object.size gives about 546504 bytes, approximately the same
    1
  },
  error = function(e) {
    write.csv(e$message,file="foreach_failure.txt")
    0
  }
  )
}

sessionInfo():

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doParallel_1.0.10   iterators_1.0.8     earth_4.4.4         plotmo_3.1.4        TeachingDemos_2.10 
 [6] plotrix_3.6-2       glmnet_2.0-5        foreach_1.4.3       Matrix_1.2-4        caretEnsemble_2.0.0
[11] caret_6.0-64        ggplot2_2.1.0       RevoUtilsMath_8.0.1 RevoUtils_8.0.1     RevoMods_8.0.1     
[16] RevoScaleR_8.0.1    lattice_0.20-33     rpart_4.1-10       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4        compiler_3.2.2     nloptr_1.0.4       plyr_1.8.3         tools_3.2.2       
 [6] lme4_1.1-11        digest_0.6.9       nlme_3.1-126       gtable_0.2.0       mgcv_1.8-12       
[11] SparseM_1.7        gridExtra_2.2.1    stringr_1.0.0      MatrixModels_0.4-1 stats4_3.2.2      
[16] grid_3.2.2         nnet_7.3-12        data.table_1.9.6   pbapply_1.2-1      minqa_1.2.4       
[21] reshape2_1.4.1     car_2.1-2          magrittr_1.5       scales_0.4.0       codetools_0.2-14  
[26] MASS_7.3-45        splines_3.2.2      pbkrtest_0.4-6     colorspace_1.2-6   quantreg_5.21     
[31] stringi_1.0-1      munsell_0.4.3      chron_2.3-47  
+4
1

, , , , -

(train_data)

- - , - .

(, , .)

0

Source: https://habr.com/ru/post/1650588/


All Articles