What is the inversion of "_mm256_cvtepi16_epi32"

I need embedded AVX2 (or earlier) that converts a 32-bit integer vector 8 wide (256 bits total) to an 8-bit 16-bit integer vector (128 bits total) [discarding the top 16-bit of each element]. This should be the inverse of "_mm256_cvtepi16_epi32". If there is no direct instruction, what is the best way to do this using a sequence of instructions?

+4
source share
1 answer

There is no single instruction inverse to the AVX512F. ( ) is also available for 512-> 256 or 128-> low_half_of_128. (Versions with inputs smaller than the 512-bit ZMM register also require AVX512VL, therefore only Skylake-X, not Xeon Phi KNL). __m128i _mm256_cvtepi32_epi16(__m256i a)VPMOVDW

There are signed / unsigned saturation versions of this AVX512 command, but only the AVX512 has a packet instruction that truncates (discarding the top bytes of each element) instead of saturation.

AVX512BW 2- vpermi2w 512- 512- . Skylake-AVX512 , VPMOVDW, , dword (32-). http://instlatx64.atw.hu/ SKops uops/ports.


SSE2/AVX2, _mm256_packus_epi32 (vpackusdw), , 128- . vpmovzxwd.

_mm256_and_si256 . , , packs_epi32 2 256- .

a = H G F E | D C B A    32-bit signed elements, shown from high element to low element, low 128-bit lane on the right
b = P O N M | L K J I

_mm256_packus_epi32(a, b)   16-bit unsigned elements
    P O N M H G F E  |  L K J I D C B A
      elements from first operand go to the low half of each lane

2x vpand/vpackuswd ymm/vpermq ymm, 256- , , , Intel. 2 shuffle uops (4 total uops) 256 , .


SSSE3/AVX2 vpshufb (_mm256_shuffle_epi8) , 128- ( ). AVX2 vpermq, 128.

__m256i trunc_elements = _mm256_shuffle_epi8(res256, shuffle_mask_32_to_16);
__m256i ordered = _mm256_permute4x64_epi64(trunc_elements, 0x58);
__m128i result  = _mm256_castsi256_si128(ordered);   // no asm instructions

, 2 128 , , 5 Intel, AVX2. , , , port0/port1, 128- .


Ryzen/Excavator vpermq ( 256- 128- uops : http://agner.org/optimize/). , vextracti128/vpor . , , vpunpcklqdq, set1_epi64 , 256- , 64 .

+5

Source: https://habr.com/ru/post/1695944/


All Articles