After publishing data.table and parallel computing , I am trying to find a way to get the operation on data.tableparallized.
I have data.tablewith 4 million rows from 14 observations and would like to share it in shared memory so that operations with it can be parallelized using a "parallel" packet with parLapply without it is necessary to copy the table for each node in the cluster (which does parLapply). Currently, the cost of moving is data.tablegreater than the advantage of parallel computing.
I found the "bigmemory" package as an answer for memory sharing, but it does not support the "data.table" data structure. So does anyone know a way:
1) put data.tablein shared memory
2) support the "data.table" data structure by doing this
3) use parallel processing on this data.table?
Thanks in advance!
source
share