How to convert _m128i to unsigned int with SSE?

I made a function for posting images.

// =( #define ARGB_COLOR(a, r, g, b) (((a) << 24) | ((r) << 16) | ((g) << 8) | (b)) inline UINT PosterizeColor(const UINT &color, const float &nColors) { __m128 clr = _mm_cvtepi32_ps( _mm_cvtepu8_epi32((__m128i&)color) ); clr = _mm_mul_ps(clr, _mm_set_ps1(nColors / 255.0f) ); clr = _mm_round_ps(clr, _MM_FROUND_TO_NEAREST_INT); clr = _mm_mul_ps(clr, _mm_set_ps1(255.0f / nColors) ); __m128i iClr = _mm_cvttps_epi32(clr); return ARGB_COLOR(iClr.m128i_u8[12], iClr.m128i_u8[8], iClr.m128i_u8[4], iClr.m128i_u8[0]); } 

in the first line, I will unpack the color into 4 floats, but I can’t find the right way to do the opposite.

I looked through the SSE docs and couldn't find the back of _mm_cvtepu8_epi32

does it exist?

+4
source share
2 answers

Unfortunately, there is no instruction for this, even in AVX (none of me know). Therefore, you will have to do it manually, as it is now.

However, your current method is not very optimal, and you rely on .m128i_u8 , which is an extension of MSVC. Based on my experience with MSVC, it will use an alignment buffer to access individual items. This is a very difficult punishment due to incomplete access.

Instead of .m128i_u8 use _mm_extract_epi32() . This is in SSE4.1. But you are already relying on SSE4.1 with _mm_cvtepu8_epi32() .

This situation is especially bad, since you are working with 1-bit drillthrough. If instead you worked with 2-byte (16-bit integer) granularity, there is an effective solution using shuffle intrinsics .

+5
source

The combination of _mm_shuffle_epi8 and _mm_cvtsi128_si32 is what you need:

 static const __m128i shuffleMask = _mm_setr_epi8(0, 4, 8, 12, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); UINT color = _mm_cvtsi128_si32(_mm_shuffle_epi8(iClr, shuffleMask)); 
+8
source

Source: https://habr.com/ru/post/1387495/


All Articles