Dummy movups created by gcc

A little curiosity that I found; GCC seems to generate the following code when I have many optimization flags:

00000000004019ae: test %si,%si 00000000004019b1: movups %xmm0,%xmm0 00000000004019b4: je 0x401f40 <main(int, char**)+1904> 

Question: What is the purpose of the second instruction? This is not like / does / anything; so, is there any optimization to align the program in the command cache? Or is it something out of order? (I am compiling with -mtune=native in Nehalem if this helps: D).

Nothing special, of course, just curious.

+4
source share
2 answers

Perhaps xmm0 contains the result of some calculations performed in an integer area (with an integer SSE instruction). The next command using xmm0 expected to be in a floating-point domain (floating-point SSE instruction).

Nehalem can execute this next statement faster if xmm0 migrated to a floating point domain with an instruction like movaps or movups . And it may be useful to perform this migration before the conditional branch instruction. In this case, the migration is performed only once. If the movups instruction is not used, migration can be performed twice (automatically, according to the first FP instruction in this register), for the first time speculatively on an incorrectly predicted branch, and the second time on the correct branch.

The compiler seems to have noticed that it’s better to optimize the dependency chains of the calculations than to optimize the size and resources for the code.

+6
source

In addition to the hypothesis proposed by Yevgeny Kluyev, other possibilities (in a specific order) are that (a) it is a compiler optimizer error, (b) movups inserted to break the dependency, or (c) it is inserted to align the code.

+2
source

Source: https://habr.com/ru/post/1392396/


All Articles