Is union more efficient than switching to modern compilers?

Question

Is union more efficient than switching to modern compilers?

Consider a simple code:

UINT64 result; UINT32 high, low; ... result = ((UINT64)high << 32) | (UINT64)low;

Do modern compilers turn this into a real high-level shift of the trunk or optimize it to a simple copy in the right place?

If not, then using a union will be more effective than the transition that most people use. However, if you optimize the compiler, this is an ideal solution.

I am wondering how I should advise people when they require it in order to get a little bit of performance.

+6

performance c compiler-optimization unions shift

Adam davis May 25, '11 at 18:00

source share

4 answers

Modern compilers are smarter than you might think ;-) (yes, I think you can expect a barrel shift on any decent compiler).

In any case, I would use a parameter that has semantics closer to what you are actually trying to do.

+4

fortran May 25 '11 at 18:17

source share

If this is supposed to be platform independent, then the only option is to use shifts here.

With union { r64; struct{low;high}} union { r64; struct{low;high}} you cannot determine which low / high fields will be mapped. Think of Enterianism.

Modern compilers do pretty well with such changes.

+4

c-smile May 25 '11 at 18:20

source share

EDIT: This answer is based on an earlier version of OP code in which it was not allocated

This code

 result = (high << 32) | low;

will actually have undefined results ... since with high you translate a 32-bit value to 32 bits (the width of the value), the results will be undefined and will depend on how the compiler and OS platforms decide to handle the shift. The result of this shift will be undefined or will be with low , which will again be undefined, since you are using the undefined value against a certain value, and therefore the final result will most likely not be a 64-bit value as you want. For example, the code emitted by gcc -s in OSX 10.6 looks like this:

 movl -4(%rbp), %eax //retrieving the value of "high" movl $32, %ecx shal %cl, %eax //performing the 32-bit shift on "high" orl -8(%rbp), %eax //OR'ing the value of "low" to the shift op result

So you can see that the shift only occurs at a 32-bit value in a 32-bit register with a 32-bit build command ... the results will ultimately be the same as high | low high | low , without any generally speaking, because in this case shal $32, %eax just returns the value that was originally in EAX . You do not get a 64-bit result.

To avoid this, draw high on uint64_t as:

 result = ((uint64_t)high << 32) | low;

+2

Jason May 25 '11 at 19:04

source share

Matthew · Accepted Answer · 2011-05-25T18:59:50+0000

I wrote the following (hopefully valid) test:

 #include <stdio.h> #include <stdint.h> #include <stdlib.h> void func(uint64_t x); int main(int argc, char **argv) { #ifdef UNION union { uint64_t full; struct { uint32_t low; uint32_t high; } p; } result; #define value result.full #else uint64_t result; #define value result #endif uint32_t high, low; if (argc < 3) return 0; high = atoi(argv[1]); low = atoi(argv[2]); #ifdef UNION result.p.high = high; result.p.low = low; #else result = ((uint64_t) high << 32) | low; #endif // printf("%08x%08x\n", (uint32_t) (value >> 32), (uint32_t) (value & 0xffffffff)); func(value); return 0; }

Running gcc -s unoptimized output difference:

 < mov -4(%rbp), %eax < movq %rax, %rdx < salq $32, %rdx < mov -8(%rbp), %eax < orq %rdx, %rax < movq %rax, -16(%rbp) --- > movl -4(%rbp), %eax > movl %eax, -12(%rbp) > movl -8(%rbp), %eax > movl %eax, -16(%rbp)

I do not know the assembly, so it is difficult for me to analyze this. However, it seems that some kind of bias is occurring, as expected, in the non-union (top) version.

But with -O2 optimization turned on, the output was identical. Thus, the same code was created, and both methods will have the same performance.

(gcc version 4.5.2 on Linux / AMD64)

Partial output of optimized -O2 code with or without combining:

  movq 8(%rsi), %rdi movl $10, %edx xorl %esi, %esi call strtol movq 16(%rbx), %rdi movq %rax, %rbp movl $10, %edx xorl %esi, %esi call strtol movq %rbp, %rdi mov %eax, %eax salq $32, %rdi orq %rax, %rdi call func

The fragment starts immediately after the transition generated by the if line.

Is union more efficient than switching to modern compilers?

More articles: