I have an analysis that can be parallelized across a different number of processes. Everything is expected to work with both the IO and the processor (very high throughput for short-read DNA alignment, if anyone is interested).
The system working with this is a 48-core Linux server.
The question is how to determine the optimal number of processes to maximize overall throughput. At some point, the processes are likely to be tied to IO in such a way that adding more processes will be useless and possibly harmful.
Is it possible to report from standard system monitoring tools when this point was reached? Will the output of the top (or possibly another tool) allow me to distinguish between the IO binding process and the CPU related? I suspect that a process blocked by IO may still show 100% CPU utilization.
source share