I have a function that finds the next power of two for a given integer. If an integer is two, it returns power.
Pretty straightforward:
char nextpow2if(int a) { char foo = char(32 - __builtin_clz(a)); bool ispow2 = !(a & a-1); if (ispow2) --foo; return foo; }
However, after compiling with gcc 6 with -O2, after checking the generated assembly, I see that after calculating foo-1 this compiled with the seemingly useless cmovne
instruction. Even worse with gcc5 and older, I get the actual jne branch in the code.
A faster way to compile this would be as if I wrote the following function:
char nextpow2sub(int a) { char foo = char(32 - __builtin_clz(a)); bool ispow2 = !(a & a-1); return foo - ispow2; }
This code is correctly compiled by all compilers to the shortest (and fastest) possible build with sete
and subtraction for bool.
Why can't the compiler optimize the first? This seems like a very simple case of identification. Why are gcc 5 and older compiling this into the actual jne
branch? Is there a boundary case between the two versions that I donβt see that could lead to them behaving differently?
PS: demo here
Edit: I did not test performance using gcc 6, but with gcc 5 the latter is about twice as fast (well, at least on the synthetic analysis test). This is what really prompted me to ask this question.
source share