This code works:
library(plyr) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=FALSE)
So far this code is not working:
library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopWorkers(workers) >Error in do.ply(i) : task 3 failed - "subscript out of bounds" In addition: Warning messages: 1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)' 2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'
I am using R 2.1.12, plyr 1.4 and doSMP 1.0-1. Has anyone figured out how to do this?
edit: In response to Andrie, here is another illustration:
system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #1 system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #2 library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=FALSE)) #3 system.time(ddply(x, .(V), function(df) Sys.sleep(1), .parallel=TRUE)) #4 stopWorkers(workers)
The first three functions work, but they all take about 3 seconds. Function No. 2 gives a warning that no parallel backend has been registered and, therefore, is executed sequentially. Function No. 4 gives the same error that I referred to in my original post.
/ edit: curioser and curiouser: The following work on my mac:
library(plyr) library(doMC) registerDoMC() x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE)
But this fails:
library(plyr) library(doSMP) workers <- startWorkers(2) registerDoSMP(workers) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopWorkers(workers)
And this also fails:
library(plyr) library(snow) library(doSNOW) cl <- makeCluster(2, type = "SOCK") registerDoSNOW(cl) x <- data.frame(V= c("X", "Y", "X", "Y", "Z" ), Z = 1:5) ddply(x, .(V), function(df) sum(df$Z),.parallel=TRUE) stopCluster(cl)
Therefore, I believe that the various parallel return ends for foreach are not interchangeable.