Is this a quirk of optimizers or the result of language rules prohibiting optimization?

I played with the compiler developer and found that these 2 functions generate different assemblies in both gcc and clang. I expected that after inlining they would create the same expression trees and, therefore, identical and optimal assemblies.

constexpr bool is_nonzero_decimal_digit(char const c) noexcept { return c == '1' || c == '2' || c == '3' || c == '4' || c == '5' || c == '6' || c == '7' || c == '8' || c == '9'; } bool is_decimal_digit_v1(char const c) noexcept { return c == '0' || is_nonzero_decimal_digit(c); } bool is_decimal_digit_v2(char const c) noexcept { return c == '0' || c == '1' || c == '2' || c == '3' || c == '4' || c == '5' || c == '6' || c == '7' || c == '8' || c == '9'; } 

Clang 3.9.1 -std = C ++ 1z -O3 result

 is_decimal_digit_v1(char): cmp dil, 48 sete cl add dil, -49 cmp dil, 9 setb al or al, cl ret is_decimal_digit_v2(char): add dil, -48 cmp dil, 10 setb al ret 

gcc 6.3 -std = C ++ 1z -O3 result

 is_decimal_digit_v1(char): cmp dil, 48 je .L3 sub edi, 49 cmp dil, 8 setbe al ret .L3: mov eax, 1 ret is_decimal_digit_v2(char): sub edi, 48 cmp dil, 9 setbe al ret 

So, is this a quirk of optimizers or the result of language rules prohibiting optimization?

+6
source share
1 answer

This is the quirk of gcc <7.0 and the clang optimizers. As Cornstalks noted in the comments, gcc 7.0 is able to generate optimal builds. I also checked VC ++ 2015, which also does:

 is_decimal_digit_v2: sub cl, 48 cmp cl, 9 setbe al ret 0 is_decimal_digit_v1: sub cl, 48 cmp cl, 9 setbe al ret 0 

As TC is pointed, the insertion is done after some optimization, which in this particular code combines a chain of comparisons with a simpler range check. It is useful to do this before insertion to reduce the functions of the sheet, which in turn increases their chances of being nested. Basically, the function v1 has been converted to something like this:

 bool is_decimal_digit_v3(char const c) noexcept { if (c == 48) return true; // this is what was inlined char tmp = c - 49; return tmp >= 0 && tmp < 9; } 

while v2 transformed into a much simpler form:

 bool is_decimal_digit_v4(char const c) noexcept { char tmp = c - 48; return tmp >= 0 && tmp < 10; } 

The generated assembly for v3 is similar to that generated for v1

 #clang 3.9.1 is_decimal_digit_v3(char): # @is_decimal_digit_v3(char) cmp dil, 48 sete cl add dil, -49 cmp dil, 9 setb al or al, cl ret # gcc 6.3 is_decimal_digit_v3(char): cmp dil, 48 je .L8 sub edi, 49 cmp dil, 8 setbe al ret .L8: mov eax, 1 ret 

I assume that converting v3 to v4 requires some nontrivial analysis, which gcc 7.0 is able to do. This version generates exactly the same assembly for all four fragments:

 is_decimal_digit_v1(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v2(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v3(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v4(char): sub edi, 48 cmp dil, 9 setbe al ret 

Interestingly, VC ++ 2015 cannot convert v3 to v4 and creates this assembly:

 is_decimal_digit_v3: cmp cl, 48 jne SHORT $LN2@is _decimal mov al, 1 ret 0 $LN2@is _decimal: xor eax, eax sub cl, 49 cmp cl, 8 setbe al ret 0 

If I were to guess, I would say why it generates the optimal code for v1, but not for v3, because it makes an investment before reducing comparisons with range checking.

+3
source

Source: https://habr.com/ru/post/1013955/


All Articles