I want to compare C / C ++ code. I want to measure processor time, wall time and cycles / bytes. I wrote some memory functions, but I have a problem with loops / bytes.
To get the processor time, I wrote the getrusage() function with RUSAGE_SELF , for wall time I use clock_gettime with MONOTONIC to get loops / bytes, I use rdtsc .
I am processing an input size buffer, for example 1024: char buffer[1024] . How to check:
- Do the warm-up phase, just call
fun2measure(args) 1000 times:
for(int i=0; i<1000; i++) fun2measure(args);
Then do a real-time test for wall time:
`unsigned long i; double timeTaken; double timeTotal = 3.0; // process 3 seconds
for (timeTaken = (double) 0, i = 0; timeTaken <= timeTotal; timeTaken = walltime (1), i ++) fun2measure (arg); `
And for processor time (almost the same):
for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++) fun2measure(args);
But when I want to get the processor cycle counter for a function, I use this piece of code:
`unsigned long s = cyclecount(); for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++) { fun2measure(args); } unsigned long e = cyclecount(); unsigned long s = cyclecount(); for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = cputime(1), i++) { fun2measure(args); } unsigned long e = cyclecount();`
and then, count cycle / byte: ((e - s) / (i * inputsSize); here inputsSize is 1024 because its length is buffer . But when I go up to totalTime for 10 s, I get strange results:
in 10 seconds:
Did fun2measure 1148531 times in 10.00 seconds for 1024 bytes, 0 cycles/byte [CPU] Did fun2measure 1000221 times in 10.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]
for 5s:
Did fun2measure 578476 times in 5.00 seconds for 1024 bytes, 0 cycles/byte [CPU] Did fun2measure 499542 times in 5.00 seconds for 1024 bytes, 7.000000 cycles/byte [WALL]
for 4s:
Did fun2measure 456828 times in 4.00 seconds for 1024 bytes, 4 cycles/byte [CPU] Did fun2measure 396612 times in 4.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]
My questions:
- Are the results obtained?
- Why, when I increase the time, I always get 0 cycles / bytes in the processor?
- How can I measure statistics of mean time, mean, standard deviation, etc. for such benchmarking?
- Is my benchmarking method 100% normal?
CHEERS!
1st EDIT:
After changing i to double :
Did fun2measure 1138164.00 times in 10.00 seconds for 1024 bytes, 0.410739 cycles/byte [CPU] Did fun2measure 999849.00 times in 10.00 seconds for 1024 bytes, 3.382036 cycles/byte [WALL]
my results look ok. So question number 2 is no longer a question :)