I have a program that generates batches (terabytes) of output and sends it to standard output.
I want to break this output and process it in parallel with a bunch of instances of another program. It can be distributed in any way if the rows remain intact.
The parallel can do this, but it takes a fixed number of rows and restarts the filter process after that:
./relgen | parallel -l 100000 -j 32
Is there a way to keep a constant number of running processes and distribute data between them?
source share