Neither GCC nor Clang will embed calls through an array of function pointers known at compile time - why?

Question

Neither GCC nor Clang will embed calls through an array of function pointers known at compile time - why?

Sample code in Compiler Explorer: https://godbolt.org/g/fPfw4k

I tried to use an array of function pointers as a jump table instead of switches, since I found it to be cleaner. However, to my surprise, neither GCC nor the Clang compiler seems to be able to do this.

Is there a specific reason?

Sample dead link code:

namespace{ template<int N> int bar(){ return N; } int foo1(int n){ if(n < 0 || n > 5){ __builtin_unreachable(); } #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif static int (* const fns[])() = { bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5> }; return fns[n](); } int foo2(int n){ #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif switch(n){ case 0: return bar<0>(); case 1: return bar<1>(); case 2: return bar<2>(); case 3: return bar<3>(); case 4: return bar<4>(); case 5: return bar<5>(); default: __builtin_unreachable(); } } } int main(int argc, char** argv){ volatile int n = foo1(argc); volatile int p = foo2(argc); }

Using the always_inline extension attribute provided by GCC and Clang is also irrelevant.

+5

c ++ gcc

Rusty shackleford Jun 18 '17 at 10:48

source share

1 answer

Paulr · Answer 1 · 2017-07-11T14:06:01+0000

The compiler cannot embed the call in foo1 because the call does not use the constant call to compile-time. If he knows that the constant argument was passed to foo1 at compile time by inserting it into a line, it will be built into the correct function.

Consider the following example:

 namespace{ template<int N> int bar(){ return N; } int foo1(int n){ if(n < 0 || n > 5){ __builtin_unreachable(); } #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif static int (* const fns[])() = { bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5> }; return fns[n](); } } int main(int argc, char** argv){ int n = foo1(3); return n; }

It is compiled into the following code by two compilers:

 main: mov eax, 3 ret

In the case of foo2, the compiler starts with 5 different calls with constant callers, all of which they are built. He then optimizes the resulting code further, generating his own jump table, if he considers it profitable.

I assume that the compiler can try to extract the key from the jump table and then add everything, but it will be rather complicated and unlikely to give a performance improvement in the general case, so neither gcc nor clang seem to do this.

Neither GCC nor Clang will embed calls through an array of function pointers known at compile time - why?

More articles: