Why is my program running faster when I overload the system with another job?

Question

Why is my program running faster when I overload the system with another job?

I did some tests of time and efficiency and came across unexpected behavior. I found that my program actually works faster if I started other background processes that linked all CPU cores to 100%. Here is a simplified sample program:

#define _XOPEN_SOURCE 600 #include <stdlib.h> #include <stdio.h> #include <time.h> void vadd(const float *u, const float *v, float *y, int n) { int i; for (i = 0; i < n; i++) { y[i] = u[i] + v[i]; } } int main(int argc, char *argv[]) { int i, its = 100000, n = 16384; float *a, *b, *c; clock_t start, end; double cpu_time; /* Make sure alignment is the same on each run. */ posix_memalign((void**)&a, 16, sizeof(float) * n); posix_memalign((void**)&b, 16, sizeof(float) * n); posix_memalign((void**)&c, 16, sizeof(float) * n); /* Some arbitrary initialization */ for (i = 0; i < n; i++) { a[i] = i; b[i] = 4; c[i] = 0; } /* Now the real work */ start = clock(); for (i = 0; i < its; i++) { vadd(a, b, c, n); } end = clock(); cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("Done, cpu time: %f\n", cpu_time); return 0; }

I am working on a rather old Pentium 4 @ 2.8GHz with Hyper Threading enabled, which appears as two processors in / proc / cpuinfo.

Output with a system with respect to idling:

 $ ./test Done, cpu time: 11.450000

And now we load all the kernels:

 $ md5sum /dev/zero& ./test; killall md5sum Done, cpu time: 8.930000

This result is consistent. I suppose that I somehow improved the efficiency of the cache by reducing the amount of time that the program moved to another processor, but this is just a shot in the dark. Can anyone confirm or deny this?

Secondary question: I was surprised to find that cpu_time can vary greatly from run to run. The method used above is taken directly from the GNU C manual , and I thought that using clock() would protect me from time fluctuations due to other processes using the CPU. Obviously, based on the above results, this is not so. So my secondary question is: is the clock() method the right way to measure performance?

Update: I reviewed the suggestions in the comments on the processor frequency scaling knob, and I don't think what is going on here. I tried to control the processor speed in real time through watch grep \"cpu MHz\" /proc/cpuinfo (as suggested here ), and I do not see the frequency changes while the program is running. I should also include in my post that I am running a rather old kernel: 2.6.25.

Update 2: I started using the script below to play around with the number of md5sum processes running. Even when I run more processes than the logical processor, it runs faster than it runs autonomously.

Update 3: If you disable Hyper Threading in the BIOS, this strange behavior will disappear, and startup always takes about 11 seconds of processor time. Hyper Threading seems to be relevant.

Update 4: I just ran this on an Intel Xeon @ 2.5GHz dual-core processor and did not see any of these strange activities. This “problem” can be quite specific to my particular hardware setup.

 #!/bin/bash declare -i num=$1 for (( num; num; num-- )); do md5sum /dev/zero & done time ./test killall md5sum

-

 $ ./run_test.sh 5 Done, cpu time: 9.070000 real 0m27.738s user 0m9.021s sys 0m0.052s $ ./run_test.sh 2 Done, cpu time: 9.240000 real 0m15.297s user 0m9.169s sys 0m0.080s $ ./run_test.sh 0 Done, cpu time: 11.040000 real 0m11.041s user 0m11.041s sys 0m0.004s

+4

performance optimization c linux

mshildt Jul 29 '13 at 13:58

source share

2 answers

Basile starynkevitch · Answer 1 · 2013-07-29T14:02:46+0000

So my secondary question is: is the clock () method the right way to measure performance?

You can use clock_gettime (2) and friends. Read also time (7)

Details can be hardware (e.g. CPU + motherboard) and specific cores.

Brendan · Answer 2 · 2013-07-29T15:46:18+0000

When executing a single process running on the kernel, clock() should return the time taken to complete this process. This includes the time during which the kernel actually ran, and the time that the kernel expected things like extracting instructions and data from the cache / memory, waiting for the results of one instruction that another command needed, etc. Basically, for this case, clock() returns "time spent executing plus a lot of small spaces."

For hyperthreading, the same core is shared by two “logical processors”. The kernel uses all these tiny small spaces in one process to execute another process, and the kernel does more general work in less time (due to less time wasted waiting). In this case, what should the clock() function measure?

For example, if 2 processes run on the same core for 10 seconds, should clock() say that both processes use 10 seconds each, or should clock() say that both processes use half 10 seconds each?

My theory is that on your clock() returns "most of the time spent on processing / processes that spend most of the time". With one running process for 10 seconds, clock() returns "10 seconds", and with two of these processes using the kernel, they can work for 16 seconds instead of 20 seconds (due to the fact that the kernel spends less time to "intervals") and clock() returns "16/2 = 8 seconds for each process"; that it seemed that the process started 2 seconds faster when there was more load (although at least 10 seconds passed instead of 10 seconds).

Why is my program running faster when I overload the system with another job?

More articles: