How to make doSMP play well with plyr?

This code works:

library(plyr) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) 

So far this code is not working:

 library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopWorkers(workers) >Error in do.ply(i) : task 3 failed - "subscript out of bounds" In addition: Warning messages: 1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)' 2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)' 

I am using R 2.1.12, plyr 1.4 and doSMP 1.0-1. Has anyone figured out how to do this?

edit: In response to Andrie, here is another illustration:

 system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1 system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2 library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3 system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4 stopWorkers(workers) 

The first three functions work, but they all take about 3 seconds. Function No. 2 gives a warning that no parallel backend has been registered and, therefore, is executed sequentially. Function No. 4 gives the same error that I referred to in my original post.

/ edit: curioser and curiouser: The following work on my mac:

 library(plyr) library(doMC) registerDoMC() x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) 

But this fails:

 library(plyr) library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopWorkers(workers) 

And this also fails:

 library(plyr) library(snow) library(doSNOW) cl <- makeCluster(2, type = "SOCK") registerDoSNOW(cl) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopCluster(cl) 

Therefore, I believe that the various parallel return ends for foreach are not interchangeable.

+4
source share
3 answers

While @hadley answered the question well, I want to add that I think plyr now works with other parallel backe-end foreach. Here is a link to a blog post containing an example where plyr is used with doSNOW:

+4
source

To confirm @LeeZamparo's answer, plyr now works with snow , at least on Windows 7 with R version 2.15.0. The last piece of code in the question works, but with cryptic warnings:

 library(plyr) library(snow) library(doSNOW) cl <- makeCluster(2, type = "SOCK") registerDoSNOW(cl) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) library(microbenchmark) mb <- microbenchmark( PP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE), NP <- ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE) ) stopCluster(cl) 

Hidden warnings:

 > warnings() Warning messages: 1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ... 

This is not fast, I think the overhead ...

 > mb Unit: milliseconds expr 1 NP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = FALSE) 2 PP <- ddply(x, .(V), function(df) sum(df$Z), .parallel = TRUE) min lq median uq max 1 11.91518 15.74567 20.10944 23.30453 38.09237 2 314.58008 336.81160 348.42421 358.57337 575.11220 

Check if the expected result

 > PP V V1 1 X 4 2 Y 6 3 Z 5 

Additional information about this session:

 > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.1-3 doSNOW_1.0.6 iterators_1.0.6 [4] foreach_1.4.0 plyr_1.7.1 snow_0.3-10 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.0 tools_2.15.0 
+2
source

It turns out that plyr only works with doMC , but the developer is working on it.

+1
source

Source: https://habr.com/ru/post/1346817/


All Articles