I did a test on godbolt.org think_and_do and in Basic
First observation, if your examples are trivial, they are mostly optimized. Without cin both examples should have compiled:
xor eax, eax add rsp, 8 #may or may not be present. ret
The second point is that the code is exactly the same as the main one: and not one of the functions is called, everything is nested.
The third point is that both examples do the following code
mov edx, DWORD PTR a[rip] mov eax, DWORD PTR b[rip] cmp edx, eax je .L8
That is, they fill out one cycle of 4 instructions in order to maximize their use (and ignore the possibility of macro merging cmp and jump).
If they started with
cmp edx, eax je .L8
Half the bandwidth of the problem could potentially be wasted.
source share