How optimized is the Visual C ++ 2008/2010 compiler?

I'm just wondering how well the MSVC ++ compiler can optimize code (with code examples) or what it cannot optimize and why.

For example, I used SSE-intrinsics with something like this (var is the value of __m128) (this was for the test with an excerpt):

if( var.m128_f32[0] > 0.0f && var.m128_f32[1] > 0.0f && var.m128_f32[2] > 0.0f && var.m128_f32[3] > 0.0f ) {
    ...
}

As I looked at asm-output, I saw that it was compiled with an ugly very nervous version (and I know that the processor just hates hard transitions), and I also know that I can optimize it using SSE4. 1, but why didn't the compiler do this (even if the compiler authors defined the built-in PTEST, so they knew the instruction)?

What kind of optimization can he not do (so far).

Does this mean that im with today's technology are forced to use the built-in ASM and related functions of ASM and will compilers ever find such things (I don’t think so)?

Where can I find out more about how well the MSVC ++ compiler is optimized?

(Change 1): I used switch SSE2 and FP: quick switch

+3
source share
5 answers

By default, the compiler is set to generate code that will run on the CPU with the lowest common denominator, that is, without SSE 4.1 instructions.

You can change this by specifying later processors only in build options.

, MS " ", SSE. , SSE 4 . GCC SSE:

GCC - , , Intels

, !

+4

, Intel ICC - , Visual ++, SSE-. 30- intel.com.

+2

asm- , .

+1
0

Ïf-statements , , . , CPU ( ), , . , :). , , , , . Agner Fog .

- C- . (& &) , C , f32, , , > 0,0f ( ). : test1 true branch taken (t1tbt), test1 false no branch (t1fnb) test2 (t2tbt) ..,

t1tbt                      ; var.m128_f32[0] <= 0.0f
t1fnb t2tbt                ; var.m128_f32[0] >  0.0f, var.m128_f32[1] <= 0.0f
t1fnb t2fnb t3tbt          ; var.m128_f32[0] >  0.0f, var.m128_f32[1] >  0.0f,
                           ; var.m128_f32[2] <= 0.0f
t1fnb t2fnb t3fnb t4tbt    ; var.m128_f32[0] >  0.0f, var.m128_f32[1] >  0.0f,
                           ; var.m128_f32[2] >  0.0f, var.m128_f32[3] <= 0.0f
t1fnb t2fnb t3fnb t4fnb    ; var.m128_f32[0] >  0.0f, var.m128_f32[1] >  0.0f
                           ; var.m128_f32[2] >  0.0f, var.m128_f32[3] >  0.0f

, .

, ( ), var , , . , 1.0f , 0x00, 0x00, 0x80, 0x3f (x86/little-endian). 0x3f800000 +1065353216. 0.0f - 0x00, 0x00, 0x00, 0x00 0x00000000 (). float , , , (0x80000000).

0

Source: https://habr.com/ru/post/1754704/


All Articles