EDIT: Indeed, I had a strange error in the time code leading to these results. When I fixed my mistake, the smart version was faster than expected. My time code looked like this:
bool x = false; before = now(); for (int i=0; i<N; ++i) { x ^= smart_xor(A[i],B[i]); } after = now();
I did ^= to keep my compiler from optimizing for-loop. But I think ^= is somehow weirdly interacting with two xor functions. I modified the time code to just populate the xor result array, and then do the calculation with that array outside of the programmed code. And these are fixed things.
Delete this question?
End edit
I defined two C ++ functions as follows:
bool smart_xor(bool a, bool b) { return a^b; } bool dumb_xor(bool a, bool b) { return a?!b:b; }
My time tests show that dumb_xor() slightly faster (1.31ns versus 1.90ns when inlined, 1.92ns versus 2.21ns when not nested). This puzzles me, since the ^ operator must be a single machine operation. I am wondering if anyone has an explanation.
The assembly is as follows (if not specified):
.file "xor.cpp" .text .p2align 4,,15 .globl _Z9smart_xorbb .type _Z9smart_xorbb, @function _Z9smart_xorbb: .LFB0: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 movl %esi, %eax xorl %edi, %eax ret .cfi_endproc .LFE0: .size _Z9smart_xorbb, .-_Z9smart_xorbb .p2align 4,,15 .globl _Z8dumb_xorbb .type _Z8dumb_xorbb, @function _Z8dumb_xorbb: .LFB1: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 movl %esi, %edx movl %esi, %eax xorl $1, %edx testb %dil, %dil cmovne %edx, %eax ret .cfi_endproc .LFE1: .size _Z8dumb_xorbb, .-_Z8dumb_xorbb .ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3" .section .note.GNU-stack,"",@progbits
I am using g ++ 4.4.3-4ubuntu5 on Intel Xeon X5570. I compiled with -O3.