(This is my first time trying to create a reproducible sample question - please feel free to comment to better describe or illustrate the problems!)
MAIN APPLICATION FOR ISSUES
I train ~ 25,000 models in parallel using foreach %dopar%and caretList(from the package caretEnsemble). Due to problems with R and memory problems, I need to save each of the forecasts as a separate object, so my workflow looks something like this: see below for a reproduced example.
cl <- makePSOCKcluster(4)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)
multiple.forecasts <- foreach(x=1:1,.combine='rbind',.packages=c('zoo','earth','caret',"glmnet","caretEnsemble")) %dopar% {
tryCatch({
results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
results[[i]]$trainingData <- c()
}
save(results,file="foreach_results.Rdata")
1
},
error = function(e) {
write.csv(e$message,file="foreach_failure.txt")
0
}
)
}
(IRL this project does not include data mtcars- each iteration of the loop foreachiterates over one of the data frames in the list and saves a new forecast object for each data frame.)
foreach, 136 Windows - .
, foreach, :
results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
results[[i]]$trainingData <- c()
}
save(results,file="no_foreach_results.Rdata")
, , 156 Windows. , Windows?
foreach 4 , foreach 10 , , 25 000 .
- ,
save foreach : , , , saveRDS (. ), , . Trim, , caretList: Trim trainControl, , , , trainingData.save xz: foreach, , . 3-4 , .- PSOCK
caret : . . saveRDS .. saveRDS save, .tryCatch . tryCatch foreach .
:
library(caret)
library(caretEnsemble)
fitControl <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 10,
trim=TRUE)
results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
results[[i]]$trainingData <- c()
}
object.size(results)
save(results,file="no_foreach_results.Rdata")
library(doParallel)
cl <- makePSOCKcluster(4)
clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)
multiple.forecasts <- foreach(x=1:1,.combine='rbind',.packages=c('zoo','earth','caret',"glmnet","caretEnsemble")) %dopar% {
tryCatch({
results <- caretList(mpg ~ cyl,data=mtcars,trControl=fitControl,methodList=c("glmnet","lm","earth"),continue_on_fail = TRUE)
for (i in 1:length(results)) {
results[[i]]$trainingData <- c()
}
save(results,file="foreach_results.Rdata")
1
},
error = function(e) {
write.csv(e$message,file="foreach_failure.txt")
0
}
)
}
sessionInfo():
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2012 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8 earth_4.4.4 plotmo_3.1.4 TeachingDemos_2.10
[6] plotrix_3.6-2 glmnet_2.0-5 foreach_1.4.3 Matrix_1.2-4 caretEnsemble_2.0.0
[11] caret_6.0-64 ggplot2_2.1.0 RevoUtilsMath_8.0.1 RevoUtils_8.0.1 RevoMods_8.0.1
[16] RevoScaleR_8.0.1 lattice_0.20-33 rpart_4.1-10
loaded via a namespace (and not attached):
[1] Rcpp_0.12.4 compiler_3.2.2 nloptr_1.0.4 plyr_1.8.3 tools_3.2.2
[6] lme4_1.1-11 digest_0.6.9 nlme_3.1-126 gtable_0.2.0 mgcv_1.8-12
[11] SparseM_1.7 gridExtra_2.2.1 stringr_1.0.0 MatrixModels_0.4-1 stats4_3.2.2
[16] grid_3.2.2 nnet_7.3-12 data.table_1.9.6 pbapply_1.2-1 minqa_1.2.4
[21] reshape2_1.4.1 car_2.1-2 magrittr_1.5 scales_0.4.0 codetools_0.2-14
[26] MASS_7.3-45 splines_3.2.2 pbkrtest_0.4-6 colorspace_1.2-6 quantreg_5.21
[31] stringi_1.0-1 munsell_0.4.3 chron_2.3-47