I am trying to use a simple profiler to evaluate the effectiveness of C code on a school server, and I am facing an odd situation. After a short period of time (half a second), the processor suddenly begins to execute instructions twice as fast. I tested all the possible reasons why I could think (caching, load balancing on the cores, the processor frequency changed due to getting out of sleep), but everything seems normal.
For what it's worth, I'm doing this testing on a school Linux server, so there might be an unusual configuration that I donβt know about, but the processor ID used does not change and (above) the server was completely inactive when I tested.
Test code:
#include <time.h> #include <stdio.h> #define MY_CLOCK CLOCK_MONOTONIC_RAW // no difference if set to CLOCK_THREAD_CPUTIME_ID typedef struct { unsigned int tsc; unsigned int proc; } ans_t; static ans_t rdtscp(void){ ans_t ans; __asm__ __volatile__ ("rdtscp" : "=a"(ans.tsc), "=c"(ans.proc) : : "edx"); return ans; } static void nop(void){ __asm__ __volatile__ (""); } void test(){ for(int i=0; i<100000000; i++) nop(); } int main(){ int c=10; while(c-->0){ struct timespec tstart,tend; ans_t start = rdtscp(); clock_gettime(MY_CLOCK,&tstart); test(); ans_t end = rdtscp(); clock_gettime(MY_CLOCK,&tend); unsigned int tdiff = (tend.tv_sec-tstart.tv_sec)*1000000000+tend.tv_nsec-tstart.tv_nsec; unsigned int cdiff = end.tsc-start.tsc; printf("%u cycles and %u ns (%lf GHz) start proc %u end proc %u\n",cdiff,tdiff,(double)cdiff/tdiff,start.proc,end.proc); } }
The output I see:
351038093 cycles and 125680883 ns (2.793091 GHz) start proc 14 end proc 14 350911246 cycles and 125639359 ns (2.793004 GHz) start proc 14 end proc 14 350959546 cycles and 125656776 ns (2.793001 GHz) start proc 14 end proc 14 351533280 cycles and 125862608 ns (2.792992 GHz) start proc 14 end proc 14 350903833 cycles and 125636787 ns (2.793002 GHz) start proc 14 end proc 14 350924336 cycles and 125644157 ns (2.793002 GHz) start proc 14 end proc 14 349827908 cycles and 125251782 ns (2.792997 GHz) start proc 14 end proc 14 175289886 cycles and 62760404 ns (2.793001 GHz) start proc 14 end proc 14 175283424 cycles and 62758093 ns (2.793001 GHz) start proc 14 end proc 14 175267026 cycles and 62752232 ns (2.793001 GHz) start proc 14 end proc 14
I get a similar conclusion (with a different number of tests, doubles the efficiency) using different optimization levels (from -O0 to -O3).
Perhaps this is due to hyperthreading, when two logical cores in the physical core (the server uses Xeon X5560s, which can have this effect) can somehow "merge" to form one double processor?
source share