How to load two sets of 4 shorts into the XMM register?

I'm just starting out with SSE-intrinsics using Visual C ++ 2012, and I need some pointers (no pun intended).

I have two arrays containing 4 signed short each (each array is thus 64-bit, a total of 128). I want to load it into the upper bits of the XMM register, and the other into the lower bits. Can I do this efficiently using the built-in SSE features? If so, how?

+6
source share
1 answer

SSE2:

 short A[] = {0,1,2,3}; short B[] = {4,5,6,7}; __m128i a,b,v; a = _mm_loadl_epi64((const __m128i*)A); b = _mm_loadl_epi64((const __m128i*)B); v = _mm_unpacklo_epi64(a,b); // v = {0,1,2,3,4,5,6,7} 

SSE4.1 + x64:

 short A[] = {0,1,2,3}; short B[] = {4,5,6,7}; __m128i v; v = _mm_loadl_epi64((const __m128i*)A); v = _mm_insert_epi64(v,*(const long long*)B,1); // v = {0,1,2,3,4,5,6,7} 

Note that for A or B there are no alignment requirements. But I would recommend that both of them be aligned with 8 bytes.

+12
source

Source: https://habr.com/ru/post/943670/


All Articles