Let me show you the first thing I want to answer these questions: Intel Intelligent Guide on the Internet. You provide inside information, and it tells you what it does, and provides the latency and bandwidth of Nehalem through Haswell processors (and soon Broadwell). Here are the results:
_mm_mul_ps
Latency Reciprocal throughput Haswell 5 0.5 Ivy Bridge 5 1 Sandy Bridge 5 1 Westmere 4 1 Nehalem 4 1
_mm_mul_epi32
Latency Reciprocal throughput Haswell 5 1 Ivy Bridge 3 1 Sandy Bridge 3 1 Westmere 3 1 Nehalem 3 1
Lower latency and reverse throughput are better. From these tables we can conclude that
- with the exception of Haswell, the delay for
_mm_mul_epi32 less than for _mm_mul_ps , - on Haswell, the delay is the same,
- with the exception of Haswell, the throughput is the same
- Haswell
_mm_mul_ps twice as much bandwidth for _mm_mul_ps as _mm_mul_epi32 .
Bandwidth on Jasuel is the only major surprise.
If you want to get results for pre-Nehalem processors and / or AMD processors, see the Agner Fog User Guide or run it to test the programs that it used to measure latency and throughput.
source share