SIMD: find the min / max value from _m128i

I want to find the minimum / maximum value in an array of bytes using SIMD operations. So far, I managed to go through the array and save the minimum / maximum value in the variable _m128i, but this means that the value I'm looking for is mixed between others (more precisely, 15).

I found these discussions here and here for an integer, and this page is for a float, but I don’t understand how _mm_shuffle * works. So my questions are:

  • What SIMD operations need to be performed to extract the minimum / maximum byte value (or without a signed byte) from the variable _m128i?
  • How does _mm_shuffle * work? I do not understand when I look at the "minimal" documentation on the Internet. I know this is due to the _ MM_SHUFFLE macro , but I am not getting this example.
+4
source share
3 answers

Here is an example of horizontal max for uint8_t:

#include "tmmintrin.h" // requires SSSE3

__m128i _mm_hmax_epu8(const __m128i v)
{
    __m128i vmax = v;

    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 1));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 2));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 4));
    vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 8));

    return vmax;
}

The maximum value will be returned in all elements. If you need a value as a scalar, use _mm_extract_epi8.

It should be pretty obvious how to adapt this value for min, and for signed min / max.

+4
source

Alternatively, convert to words and use phminposuw(not verified)

int hminu8(__m128i x)
{
  __m128i l = _mm_unpacklo_epi8(x, _mm_setzero_si128());
  __m128i h = _mm_unpackhi_epi8(x, _mm_setzero_si128());
  l = _mm_minpos_epu16(l);
  h = _mm_minpos_epu16(h);
  return _mm_extract_epi16(_mm_min_epu16(l, h), 0);
}

, min/shuffle, . phminposuw, , . ( )

uint8_t hminu8(__m128i x)
{
  x = _mm_min_epu8(x, _mm_srli_epi16(x, 8));
  x = _mm_minpos_epu16(x);
  return _mm_cvtsi128_si32(x);
}

max, : .

+1

256- . , .

#include "immintrin.h"

uint8_t max_u8 (const __m256i v)
{
    __m256i gm = v;
    gm = _mm256_max_epu8 (gm, _mm256_slli_si256 (gm, 1));
    gm = _mm256_max_epu8 (gm, _mm256_slli_si256 (gm, 2));
    gm = _mm256_max_epu8 (gm, _mm256_slli_si256 (gm, 4));
    gm = _mm256_max_epu8 (gm, _mm256_slli_si256 (gm, 8));
    gm = _mm256_max_epu8 (gm, _mm256_slli_si256 (gm, 16));
    return _mm256_extract_epi8 (gm, 31);
}
+1

Source: https://habr.com/ru/post/1662904/


All Articles