Neither GCC nor Clang will embed calls through an array of function pointers known at compile time - why?

Sample code in Compiler Explorer: https://godbolt.org/g/fPfw4k

I tried to use an array of function pointers as a jump table instead of switches, since I found it to be cleaner. However, to my surprise, neither GCC nor the Clang compiler seems to be able to do this.

Is there a specific reason?

Sample dead link code:

namespace{ template<int N> int bar(){ return N; } int foo1(int n){ if(n < 0 || n > 5){ __builtin_unreachable(); } #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif static int (* const fns[])() = { bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5> }; return fns[n](); } int foo2(int n){ #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif switch(n){ case 0: return bar<0>(); case 1: return bar<1>(); case 2: return bar<2>(); case 3: return bar<3>(); case 4: return bar<4>(); case 5: return bar<5>(); default: __builtin_unreachable(); } } } int main(int argc, char** argv){ volatile int n = foo1(argc); volatile int p = foo2(argc); } 

Using the always_inline extension attribute provided by GCC and Clang is also irrelevant.

+5
source share
1 answer

The compiler cannot embed the call in foo1 because the call does not use the constant call to compile-time. If he knows that the constant argument was passed to foo1 at compile time by inserting it into a line, it will be built into the correct function.

Consider the following example:

 namespace{ template<int N> int bar(){ return N; } int foo1(int n){ if(n < 0 || n > 5){ __builtin_unreachable(); } #if __clang__ __builtin_assume(n >= 0 && n <= 5); #endif static int (* const fns[])() = { bar<0>, bar<1>, bar<2>, bar<3>, bar<4>, bar<5> }; return fns[n](); } } int main(int argc, char** argv){ int n = foo1(3); return n; } 

It is compiled into the following code by two compilers:

 main: mov eax, 3 ret 

In the case of foo2, the compiler starts with 5 different calls with constant callers, all of which they are built. He then optimizes the resulting code further, generating his own jump table, if he considers it profitable.

I assume that the compiler can try to extract the key from the jump table and then add everything, but it will be rather complicated and unlikely to give a performance improvement in the general case, so neither gcc nor clang seem to do this.

0
source

Source: https://habr.com/ru/post/1268949/


All Articles