Will the I / O lock process be displayed at 100% CPU utilization at the "top" output?

I have an analysis that can be parallelized across a different number of processes. Everything is expected to work with both the IO and the processor (very high throughput for short-read DNA alignment, if anyone is interested).

The system working with this is a 48-core Linux server.

The question is how to determine the optimal number of processes to maximize overall throughput. At some point, the processes are likely to be tied to IO in such a way that adding more processes will be useless and possibly harmful.

Is it possible to report from standard system monitoring tools when this point was reached? Will the output of the top (or possibly another tool) allow me to distinguish between the IO binding process and the CPU related? I suspect that a process blocked by IO may still show 100% CPU utilization.

+4
source share
3 answers

Even a single IO-bound process rarely demonstrates high CPU utilization, since the operating system has assigned its IO and usually just waits for completion. Thus, a vertex cannot accurately distinguish between a process associated with an IO and a process that is not related to an IO that simply uses a processor periodically. In fact, a system that is terribly overloaded with all IO-related processes, barely achievable, can exhibit very low CPU utilization.

Using only the top as the first pass, you can really just add threads / processes until the CPU utilization levels are set to determine the approximate configuration for this machine.

+1
source

When a process is blocked in IO, it does not work, so time is not taken into account for it. If there is another process that can work, then it will work; if this does not happen, time is considered a β€œpending wait”, which is taken into account as global statistics.

Waiting for an IO would be a useful monitoring task. It appears in the top header as something like% iw. You can control it in more detail with tools like iostat and vmstat. Serverfault might be the best place to ask about this.

+6
source

You can use tools like iostat and vmstat to show how many time processes are spent blocking I / O. There is usually no harm in adding more processes than you need, but the advantage is reduced. You should measure throughput or processes as a measure of overall efficiency.

+1
source

Source: https://habr.com/ru/post/1333703/


All Articles