Separate text and process in parallel

I have a program that generates batches (terabytes) of output and sends it to standard output.

I want to break this output and process it in parallel with a bunch of instances of another program. It can be distributed in any way if the rows remain intact.

The parallel can do this, but it takes a fixed number of rows and restarts the filter process after that:

./relgen | parallel -l 100000 -j 32 --spreadstdin ./filter 

Is there a way to keep a constant number of running processes and distribute data between them?

+6
source share
1 answer

-l not suitable for performance. Use --block if possible.

You can distribute data by volume with rounding: --roundrobin .

 ./relgen | parallel --block 3M --round-robin -j 32 --pipe ./filter 
+2
source

Source: https://habr.com/ru/post/1012520/


All Articles