In practice, your compiler should do a good job of creating a constant vector for 0.0. It probably just uses _mm_xor_ps , and if your code is in a loop, it should still push the constant generation out of the loop. So, on the bottom line, use the original idea:
v = _mm_sub_ps(_mm_set1_ps(0.0), v);
or another common trick that:
v = _mm_xor_ps(v, _mm_set1_ps(-0.0));
which simply flips the sign bits instead of performing the subtraction (not as safe as the first method, since it does not do the right thing with NaNs, but may be more efficient in some cases).
source share