SSE2:
short A[] = {0,1,2,3}; short B[] = {4,5,6,7}; __m128i a,b,v; a = _mm_loadl_epi64((const __m128i*)A); b = _mm_loadl_epi64((const __m128i*)B); v = _mm_unpacklo_epi64(a,b);
SSE4.1 + x64:
short A[] = {0,1,2,3}; short B[] = {4,5,6,7}; __m128i v; v = _mm_loadl_epi64((const __m128i*)A); v = _mm_insert_epi64(v,*(const long long*)B,1);
Note that for A or B there are no alignment requirements. But I would recommend that both of them be aligned with 8 bytes.
source share