Is the UNIX `time` command accurate enough for tests?

Question

Is the UNIX `time` command accurate enough for tests?

Let's say I wanted to compare two programs: foo.py and bar.py.

Are several thousand runs and the corresponding average values of time python foo.py and time python bar.py sufficient to profile and compare their speed?

Edit: Also, if the execution of each program was subsea (suppose it wasn’t for the above), will time be ok to use?

+38

profiling benchmarking linux unix

chrisdotcode Jan 25 '12 at 16:52

source share

4 answers

Currently, imo, it makes no sense to use time for benchmarking purposes. Use perf stat instead. This gives you much more useful information and can repeat the benchmarking process for a certain amount of time and make statistics about the results, i.e. Calculate the variance and mean. This is much more reliable and easy to use as time :

 perf stat -r 10 -d <your app and arguments>

-r 10 will run your application 10 times and make statistics on it. -d prints some more data, such as cache misses.

Thus, while time can be reliable enough for long-term applications, it is definitely not as reliable as perf stat . Use this instead.

Addendum: If you really want to use time , at least do not use the bash -builtin command, but the real deal in verbose mode:

 /usr/bin/time -v <some command with arguments>

The output is then, for example:

  Command being timed: "ls" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1968 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 93 Voluntary context switches: 1 Involuntary context switches: 2 Swaps: 0 File system inputs: 8 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

In particular, pay attention to how it is possible to measure peak RSS, which is often enough if you want to compare the effect of the patch on peak memory consumption. That is, use this value to compare before / after, and if there is a significant decrease in the RSS peak, you did something right.

+31

milianw Nov 12 '14 at 11:24

source share

Yes, time is accurate enough. And you will need to run only a dozen of your programs (provided that the run lasts more than a second or a significant part of a second - that is, at least more than 200 milliseconds). Of course, the file system will be hot (i.e., small files will already be cached in RAM) for most runs (except the first), so keep this in mind.

^{the reason you want the time -d run to last a few tenths of a second is the accuracy and granularity of the time measurement.} ^{Do not expect less than a hundredth second of accuracy.} ^{(you need a special kernel option to have it in a millisecond)}

Inside the application, you can use clock , clock_gettime , gettimeofday , getrusage , times (they probably have the Python Equivalent).

Remember to read the time (7) man page.

+5

Basile Starynkevitch Jan 25 '12 at 16:59

source share

Yes. The time command gives both the elapsed time and the consumed CPU. The latter, probably you should focus if you are not doing a lot of I / O. If elapsed time is important, make sure the system has no other significant effect during the test.

+3

schtever Jan 25 2018-12-12T00:

source share

Maxim Egorushkin · Accepted Answer · 2012-01-25 17:04

time gives a reasonably good time for tests that run within one second, otherwise the time spent on the exec() process can be large compared to its execution time.

However, when benchmarking, you should be aware of context switching. That is, another process may use a processor that competes for the processor with your benchmark and increases the execution time. To avoid competition with other processes, you should run this test:

 sudo chrt -f 99 /usr/bin/time --verbose <benchmark>

sudo chrt -f 99 runs your test in a FIFO real-time class with priority 99, which makes your process a priority process and avoids context switching (you can change your /etc/security/limits.conf so that it does not require a privileged process to use priorities in real time).

It also tells time all available statistics, including the number of contexts that switches your test result, which should usually be 0, otherwise you may need to repeat the test.

And it’s better to disable scaling and increasing the frequency of the processor so that the processor frequency remains constant during the test to get consistent results.

Is the UNIX `time` command accurate enough for tests?

More articles: