My idea was to give an elegant code example that would demonstrate the effect of command cache restrictions. I wrote the following code fragment that creates a large number of identical functions using template metaprogramming.
volatile int checksum; void (*funcs[MAX_FUNCS])(void); template <unsigned t> __attribute__ ((noinline)) static void work(void) { ++checksum; } template <unsigned t> static void create(void) { funcs[t - 1] = &work<t - 1>; create<t - 1>(); } template <> void create<0>(void) { } int main() { create<MAX_FUNCS>(); for (unsigned range = 1; range <= MAX_FUNCS; range *= 2) { checksum = 0; for (unsigned i = 0; i < WORKLOAD; ++i) { funcs[i % range](); } } return 0; }
The outer loop changes the number of different functions that will be called using the jump table. For each pass of the cycle, the time spent on calling WORKLOAD functions is WORKLOAD . What are the results? The following table shows the average runtime for calling a function relative to the range used. The blue line shows the data measured on a Core i7 machine. The comparative measurement depicted by the red line was carried out on a Pentium 4 machine. But when it comes to interpreting these lines, I seem to be struggling somehow ...

The only jumps of the piecewise constant red curve occur exactly where the total memory consumption for all functions within the range exceeds the capacity of one cache level on the machine under test, which does not have a dedicated instruction cache. However, for very small ranges (less than 4 in this case), the operating time increases with the number of functions. This may be due to the efficiency of branch prediction, but since each function call reduces to an unconditional transition in this case, Iβm not sure that there should be no branching penalty at all.
The blue curve behaves in a completely different way. The operating time is constant for small ranges and then increases logarithmically. However, for large ranges, the curve again approaches the constant asymptote. How to accurately explain the qualitative differences between the two curves?
I am currently using GCC MinGW Win32 x86 v.4.8.1 with g++ -std=c++11 -ftemplate-depth=65536 and g++ -std=c++11 -ftemplate-depth=65536 not optimizing the compiler.
Any help would be greatly appreciated. I am also interested in any idea on how to improve the experiment itself. Thanks in advance!
source share