This is the quirk of gcc <7.0 and the clang optimizers. As Cornstalks noted in the comments, gcc 7.0 is able to generate optimal builds. I also checked VC ++ 2015, which also does:
is_decimal_digit_v2: sub cl, 48 cmp cl, 9 setbe al ret 0 is_decimal_digit_v1: sub cl, 48 cmp cl, 9 setbe al ret 0
As TC is pointed, the insertion is done after some optimization, which in this particular code combines a chain of comparisons with a simpler range check. It is useful to do this before insertion to reduce the functions of the sheet, which in turn increases their chances of being nested. Basically, the function v1 has been converted to something like this:
bool is_decimal_digit_v3(char const c) noexcept { if (c == 48) return true;
while v2 transformed into a much simpler form:
bool is_decimal_digit_v4(char const c) noexcept { char tmp = c - 48; return tmp >= 0 && tmp < 10; }
The generated assembly for v3 is similar to that generated for v1
#clang 3.9.1 is_decimal_digit_v3(char): # @is_decimal_digit_v3(char) cmp dil, 48 sete cl add dil, -49 cmp dil, 9 setb al or al, cl ret # gcc 6.3 is_decimal_digit_v3(char): cmp dil, 48 je .L8 sub edi, 49 cmp dil, 8 setbe al ret .L8: mov eax, 1 ret
I assume that converting v3 to v4 requires some nontrivial analysis, which gcc 7.0 is able to do. This version generates exactly the same assembly for all four fragments:
is_decimal_digit_v1(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v2(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v3(char): sub edi, 48 cmp dil, 9 setbe al ret is_decimal_digit_v4(char): sub edi, 48 cmp dil, 9 setbe al ret
Interestingly, VC ++ 2015 cannot convert v3 to v4 and creates this assembly:
is_decimal_digit_v3: cmp cl, 48 jne SHORT $LN2@is _decimal mov al, 1 ret 0 $LN2@is _decimal: xor eax, eax sub cl, 49 cmp cl, 8 setbe al ret 0
If I were to guess, I would say why it generates the optimal code for v1, but not for v3, because it makes an investment before reducing comparisons with range checking.