Converting from float to int with rounding occurs quite often in C ++ code that works with floating point data. One of them, for example, is to create conversion tables.
Consider this piece of code:
C / C ++ defines (int) cast as truncating, so you need to add 0.5f to ensure rounding to the nearest positive integer (when the input is positive). For the above, the VS2015 compiler generates the following code:
movss xmm9, DWORD PTR __real@3f000000
The above works, but may be more effective ...
Intel developers, it seemed, thought that it was enough to solve the problem with one command, which would do exactly what was needed: Convert to the nearest integer value: cvtss2si (note, there is only one for the mnemonics).
If cvtss2si was to replace the cvttss2si instruction in the above sequence, two of the three commands would simply be eliminated (as well as using an additional xmm register, which may lead to better optimization in general).
So, how can we code C ++ instructions (s) to do this simple job with a single cvtss2si instruction?
I already thought a lot, trying to do something like the following, but even with the optimizer for the task, it does not come down to one machine instruction that could / should have done the job:
int RoundedIntValue = _mm_cvt_ss2si(_mm_set_ss(FloatValue));
Unfortunately, the above seems to be striving to clear an entire register vector that will never be used, instead of just using a single 32-bit value.
movaps xmm1, xmm0 xorps xmm2, xmm2 movss xmm2, xmm1 cvtss2si eax, xmm2
Perhaps I do not see an obvious approach here.
Can you suggest the proposed set of C ++ instructions that will ultimately generate a single cvtss2si instruction?