All the top rated answers are not really the ultimate "facts" ... these are people who speculate!
You can finally find out which code executes less build commands, because you can look at the assembly generated by the compiler and see what is done in the smaller build instructions!
Here is the c code that I compiled with the flags "gcc -std = c99 -S -O3 lookAtAsmOutput.c":
#include <stdio.h>
ASM output for swap_traditional () accepts →> 11 <instructions (not including "leave", "ret", "size"):
.globl swap_traditional .type swap_traditional, @function swap_traditional: pushl %ebp movl %esp, %ebp movl 8(%ebp), %edx movl 12(%ebp), %ecx pushl %ebx movl (%edx), %ebx movl (%ecx), %eax movl %ebx, (%ecx) movl %eax, (%edx) popl %ebx popl %ebp ret .size swap_traditional, .-swap_traditional .p2align 4,,15
ASM output for swap_xor () accepts →> 11 <instructions that do not include leave and ret:
.globl swap_xor .type swap_xor, @function swap_xor: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx movl 12(%ebp), %edx movl (%ecx), %eax xorl (%edx), %eax movl %eax, (%ecx) xorl (%edx), %eax xorl %eax, (%ecx) movl %eax, (%edx) popl %ebp ret .size swap_xor, .-swap_xor .p2align 4,,15
Build Summary:
swap_traditional () accepts 11 instructions
swap_xor () accepts 11 commands
Output:
Both methods use the same number of instructions to execute and, therefore, approximately the same speed on this hardware platform.
Lesson learned:
When you have small snippets of code, viewing the asm output is useful for quickly iterating over your code and getting the fastest code (i.e. the least instructions). And you can save time, even if you do not need to run the program for every code change. You only need to start the code change at the end with the profiler to show that your code changes are faster.
I use this method for heavy DSP code that requires speed.