Sharing data.table in memory for parallel computing

Question

Sharing data.table in memory for parallel computing

After publishing data.table and parallel computing , I am trying to find a way to get the operation on data.tableparallized.

I have data.tablewith 4 million rows from 14 observations and would like to share it in shared memory so that operations with it can be parallelized using a "parallel" packet with parLapply without it is necessary to copy the table for each node in the cluster (which does parLapply). Currently, the cost of moving is data.tablegreater than the advantage of parallel computing.

I found the "bigmemory" package as an answer for memory sharing, but it does not support the "data.table" data structure. So does anyone know a way:

1) put data.tablein shared memory

2) support the "data.table" data structure by doing this

3) use parallel processing on this data.table?

Thanks in advance!

+4

shared-memory parallel-processing r data.table

O. Engl Sep 22 '15 at 10:37

source share

1 answer

rookie · Answer 1 · 2017-08-10T10:55:44+0000

, , , . , , PSOCK. , node. , . docker Rserve docker vm (, stevenpollack/docker-rserve). Linux, FORK docker vm. , R, Rserve ( RSclient), vm R.

Sharing data.table in memory for parallel computing

More articles: