Perhaps xmm0 contains the result of some calculations performed in an integer area (with an integer SSE instruction). The next command using xmm0 expected to be in a floating-point domain (floating-point SSE instruction).
Nehalem can execute this next statement faster if xmm0 migrated to a floating point domain with an instruction like movaps or movups . And it may be useful to perform this migration before the conditional branch instruction. In this case, the migration is performed only once. If the movups instruction is not used, migration can be performed twice (automatically, according to the first FP instruction in this register), for the first time speculatively on an incorrectly predicted branch, and the second time on the correct branch.
The compiler seems to have noticed that itβs better to optimize the dependency chains of the calculations than to optimize the size and resources for the code.
source share