Slower ddply when .parallel = TRUE on Mac OS X version 10.6.7

Question

Slower ddply when .parallel = TRUE on Mac OS X version 10.6.7

I am trying to run ddply in parallel on my mac. The code I used is as follows:

library(doMC) library(ggplot2) # for the purposes of getting the baseball data.frame registerDoMC(2) > system.time(ddply(baseball, .(year), numcolwise(mean))) user system elapsed 0.959 0.106 1.522 > system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE)) user system elapsed 2.221 2.790 2.552

Why is ddply slower when I run .parallel = TRUE? I searched online to no avail. I also tried registerDoMC() and the results were the same.

+6

parallel-processing r plyr macos

wespiserA Aug 24 '11 at 18:24

source share

2 answers

In parallel, it will work more slowly than it will work sequentially, when the cost of communication between nodes is greater than the time it takes to calculate the function. In other words, it takes longer to send data to / from nodes than to calculate.

For the same data set, the communication costs are approximately fixed, so parallel processing will be more useful, since the time spent on evaluating the function increases.

UPDATE:
The code below shows that 0.14 seconds (on my machine) were spent evaluating .fun . This means that the connection should be less than 0.07 seconds, and this is not realistic for a baseball size dataset.

 Rprof() system.time(ddply(baseball, .(year), numcolwise(mean))) # user system elapsed # 0.28 0.02 0.30 Rprof(NULL) summaryRprof()$by.self # self.time self.pct total.time total.pct # [.data.frame 0.04 12.50 0.10 31.25 # unlist 0.04 12.50 0.10 31.25 # match 0.04 12.50 0.04 12.50 # .fun 0.02 6.25 0.14 43.75 # structure 0.02 6.25 0.12 37.50 # [[ 0.02 6.25 0.08 25.00 # FUN 0.02 6.25 0.06 18.75 # rbind.fill 0.02 6.25 0.06 18.75 # anyDuplicated 0.02 6.25 0.02 6.25 # gc 0.02 6.25 0.02 6.25 # is.array 0.02 6.25 0.02 6.25 # list 0.02 6.25 0.02 6.25 # mean.default 0.02 6.25 0.02 6.25

Here's a parallel version with snow:

 library(doSNOW) cl <- makeSOCKcluster(2) registerDoSNOW(cl) Rprof() system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE)) # user system elapsed # 0.46 0.01 0.73 Rprof(NULL) summaryRprof()$by.self # self.time self.pct total.time total.pct # .Call 0.24 33.33 0.24 33.33 # socketSelect 0.16 22.22 0.16 22.22 # lazyLoadDBfetch 0.08 11.11 0.08 11.11 # accumulate.iforeach 0.04 5.56 0.06 8.33 # rbind.fill 0.04 5.56 0.06 8.33 # structure 0.04 5.56 0.04 5.56 # <Anonymous> 0.02 2.78 0.54 75.00 # lapply 0.02 2.78 0.04 5.56 # constantFoldEnv 0.02 2.78 0.02 2.78 # gc 0.02 2.78 0.02 2.78 # stopifnot 0.02 2.78 0.02 2.78 # summary.connection 0.02 2.78 0.02 2.78

+9

Joshua ulrich Aug 24 '11 at 19:08

source share

Brian diggs · Accepted Answer · 2011-08-24T19:30:46+0000

baseball data may be too small to see improvement, making the calculations parallel; The overhead of transferring data to different processes may depend on any acceleration by performing parallel computations. Using the rbenchmark package:

 baseball10 <- baseball[rep(seq(length=nrow(baseball)), 10),] benchmark(noparallel = ddply(baseball, .(year), numcolwise(mean)), parallel = ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE), noparallel10 = ddply(baseball10, .(year), numcolwise(mean)), parallel10 = ddply(baseball10, .(year), numcolwise(mean), .parallel=TRUE), replications = 10)

gives results

  test replications elapsed relative user.self sys.self user.child sys.child 1 noparallel 10 4.562 1.000000 4.145 0.408 0.000 0.000 3 noparallel10 10 14.134 3.098203 9.815 4.242 0.000 0.000 2 parallel 10 11.927 2.614423 2.394 1.107 4.836 6.891 4 parallel10 10 18.406 4.034634 4.045 2.580 10.210 9.769

With a 10x data set, the parallel penalty is less. A more complex calculation would also deflect him even further, which would probably give him an edge.

This was done on a Mac OS X 10.5.8 Core 2 Duo machine.

Slower ddply when .parallel = TRUE on Mac OS X version 10.6.7

More articles: