Separate text and process in parallel

Question

Separate text and process in parallel

I have a program that generates batches (terabytes) of output and sends it to standard output.

I want to break this output and process it in parallel with a bunch of instances of another program. It can be distributed in any way if the rows remain intact.

The parallel can do this, but it takes a fixed number of rows and restarts the filter process after that:

./relgen | parallel -l 100000 -j 32 --spreadstdin ./filter

Is there a way to keep a constant number of running processes and distribute data between them?

+6

bash parallel-processing gnu-parallel

Craden Nov 21 '16 at 10:27

source share

1 answer

Ole tange · Accepted Answer · 2016-11-21T19:11:26+0000

-l not suitable for performance. Use --block if possible.

You can distribute data by volume with rounding: --roundrobin .

 ./relgen | parallel --block 3M --round-robin -j 32 --pipe ./filter

Separate text and process in parallel

More articles: