How to use SIMD, how can I deploy an 8-bit mask to a 16-bit mask?

I am trying to rewrite this piece of code with simd:

int16_t v;
int32_t a[16];
int8_t b[32];
...
((int16_t *)a[i])[0] = b[i]==1? -v:v;
((int16_t *)a[i])[1] = b[i]==1? -v:v;

I thought to use _mm256_cmpeq_epi8to generate a vector mask, after which I can use _mm256_and_si256and _mm256_andnot_si256to execute the selection values.
The problem is that b [i] is an 8-bit integer and v is a 16-bit one.
If the mask vector is similar to {0xff, 0x00, 0xff, 0x00...}, it must be expanded to {0xffff, 0x0000, 0xffff, 0x0000...}to select a 16-bit value.
How can i do this? (Sorry for my English)

edit:
I found a solution with inspiration for this question .
_mm256_shuffle_epi256can only perform in 128-bit band. So I broke the mask _mm256i into registers 2 _mm128i. Then with _mm256_broadcastsi128_si256and _mm256_shuffle_epi256I got the result.

+4
source share
1 answer

There is a solution:

int16_t v;
int32_t a[16];
int8_t b[32];

//((int16_t *)a[i])[0] = b[i]==1? -v:v;
//((int16_t *)a[i])[1] = b[i]==1? -v:v;

__m256i _1 = _mm256_set1_epi8(1);
__m256i _b = _mm256_loadu_si256((__m256i*)b);

__m256i mask8i = _mm256_cmpeq_epi8(_b, _1); // 8-bit compare mask

__m256i permutedMask8i = _mm256_permute4x64_epi64(mask8i, 0xD8);
__m256i mask16iLo = _mm256_unpacklo_epi8(permutedMask8i, permutedMask8i); // low part of 16-bit compare mask
__m256i mask16iHi = _mm256_unpackhi_epi8(permutedMask8i, permutedMask8i); // high part of 16-bit compare mask

__m256i positiveV = _mm256_set1_epi16(-v); //positive mask condition
__m256i negativeV = _mm256_set1_epi16(v);  //negative mask condition

__m256i _aLo = _mm256_blendv_epi8(negativeV, positiveV, mask16iLo);
__m256i _aHi = _mm256_blendv_epi8(negativeV, positiveV, mask16iHi);

_mm256_storeu_si256((__m256i*)a + 0, _aLo);
_mm256_storeu_si256((__m256i*)a + 1, _aHi);
+2
source

Source: https://habr.com/ru/post/1670355/


All Articles