I wrote the following (hopefully valid) test:
#include <stdio.h> #include <stdint.h> #include <stdlib.h> void func(uint64_t x); int main(int argc, char **argv) { #ifdef UNION union { uint64_t full; struct { uint32_t low; uint32_t high; } p; } result; #define value result.full #else uint64_t result; #define value result #endif uint32_t high, low; if (argc < 3) return 0; high = atoi(argv[1]); low = atoi(argv[2]); #ifdef UNION result.p.high = high; result.p.low = low; #else result = ((uint64_t) high << 32) | low; #endif // printf("%08x%08x\n", (uint32_t) (value >> 32), (uint32_t) (value & 0xffffffff)); func(value); return 0; }
Running gcc -s unoptimized output difference:
< mov -4(%rbp), %eax < movq %rax, %rdx < salq $32, %rdx < mov -8(%rbp), %eax < orq %rdx, %rax < movq %rax, -16(%rbp) --- > movl -4(%rbp), %eax > movl %eax, -12(%rbp) > movl -8(%rbp), %eax > movl %eax, -16(%rbp)
I do not know the assembly, so it is difficult for me to analyze this. However, it seems that some kind of bias is occurring, as expected, in the non-union (top) version.
But with -O2 optimization turned on, the output was identical. Thus, the same code was created, and both methods will have the same performance.
(gcc version 4.5.2 on Linux / AMD64)
Partial output of optimized -O2 code with or without combining:
movq 8(%rsi), %rdi movl $10, %edx xorl %esi, %esi call strtol movq 16(%rbx), %rdi movq %rax, %rbp movl $10, %edx xorl %esi, %esi call strtol movq %rbp, %rdi mov %eax, %eax salq $32, %rdi orq %rax, %rdi call func
The fragment starts immediately after the transition generated by the if line.