It depends on the range of your multisets - they fit within 16 bits, and then up to SSE4 there are several 16-bit 16-bit SSE instructions (e.g. mm_madd_epi16 , mm_mulhi_epi16 , mm_mullo_epi16 , mm_mulhrs_epi16 , etc.).
If you need 32-bit operands but they are not defined, you can use mm_mul_epu32 .
Alternatively, you can convert to float and use _mm_mul_ps (integer ↔ float conversion to SSE is quite efficient, and the cost can be justified if you get 4x SIMD bandwidth).
source share