I implement the quick conversion function x888 β 565 pixels to pixman according to the described algorithm from Intel [pdf] . Their code converts x888 β 555, while I want to convert it to 565. Unfortunately, converting to 565 means that the high bit is set, which means that I cannot use the saturated packet package instructions. The unsigned pack instruction, packusdw was not added until SSE4.1. I would like to implement its functionality using SSE2 or find another way to do this.
This function accepts two XMM registers containing 4 32-bit pixels each and outputs one XMM register containing 8 converted RGB565 pixels.
static force_inline __m128i pack_565_2packedx128_128 (__m128i lo, __m128i hi) { __m128i rb0 = _mm_and_si128 (lo, mask_565_rb); __m128i rb1 = _mm_and_si128 (hi, mask_565_rb); __m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier); __m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier); __m128i g0 = _mm_and_si128 (lo, mask_green); __m128i g1 = _mm_and_si128 (hi, mask_green); t0 = _mm_or_si128 (t0, g0); t1 = _mm_or_si128 (t1, g1); t0 = _mm_srli_epi32 (t0, 5); t1 = _mm_srli_epi32 (t1, 5); return _mm_packus_epi32 (t0, t1); }
Ideas I thought of:
Subtract 0x8000, _mm_packs_epi32, re-add 0x8000 to every 565 pixels. I tried this, but I can not do this job.
t0 = _mm_sub_epi16 (t0, mask_8000); t1 = _mm_sub_epi16 (t1, mask_8000); t0 = _mm_packs_epi32 (t0, t1); return _mm_add_epi16 (t0, mask_8000);
Shuffle data instead of packaging. Works for MMX, but since SSE 16-bit shuffles work only with high or low 64-bit, it will be messy.
Save the high bits, set them to zero, execute the packet, then restore them. Seems pretty messy.
Are there other (hopefully more effective) ways I could do this?
source share