Integer multiplication is a common, but not one of the most common ways to do with integers. But with floating point numbers all the time, multiplication and addition are used, and FMA provides basic accelerations for a large number of FP-code tied to ALU.
In addition, the floating point actually avoids the loss of accuracy using FMA (the internal time limit x*yis not rounded up at all before adding). This is why the ISO C99 / C ++ math library function fma()exists and why it is slowly being implemented without the support of hardware FMA.
Integer FMA (or multiple accumulation, as well as MAC) does not have any advantages over separate multiplication and addition.
-x86 ISA FMA. , Intel AMD AVX512-IFMA ( SIMD, 52- , FMA/ vmulpd ).
, x86, :
MIPS32, madd/maddu ( ), - hi/lo ( , ).
ARM smlal (32x32 = > 64 MAC 16x16 = > 32 ), . R0..R15.
Integer FMA x86, uops 3 . CMOV ADC 3 , - . Intel Broadwell, , 3- uop โโ FP FMA .
Haswell, uops 3 , () . Sandybridge/Ivybridge , add eax, [rdx+rcx]. ( Nehalem , , SnB uop ). , , . Broadwell/Skylake 3- 2 + , 3 .
Intel "" , FP integer FP FMA 3 . , IDK, . , IDK, Intel FMA BMI2 - , mulx (2-input 2- mul , mul, rdx:rax.)
SSE2/SSSE3 mul-add , 16x16 = > 32- (SSE2 pmaddwd) ( ) 8x () 8 = > 16- (SSSE3 pmaddubsw).
2 , , FMA.
: , FMA . FP FMA FMA3, : VFMADD231SD, , vfmaddXXXss XMM.