When I use Visual Studio to generate AVX2 collection commands using the built-in compiler, it does not insert VXORPS instructions to break the dependency between the previous instruction that writes this YMM register and assembly.
The Intel compiler, however, does this, and a clean result is a noticeable performance improvement due to data corruption.
For reasons I don’t want to go into, I cannot use the Intel compiler, so is there a way to force Visual Studio to insert this VXORPS statement?
I already tried creating an intermediate __m256i and calling VXORPS, but that didn't work.
source
share