What is the best way to load 2 unaligned 64-bit values ​​into an sse register with SSSE3?

There are 2 pointers to 2 unaligned 8 byte chunks to load into the xmm register. If possible, use intrinsics functions. And if possible, without using an auxiliary register. No pinsrd. (SSSE Core 2)

+6
source share
2 answers

From the msvc specifications , it looks like you can do the following:

 __m128d xx; // an uninitialised xmm register xx = _mm_loadh_pd(xx, ptra); // load the higher 64 bits from (unaligned) ptra xx = _mm_loadl_pd(xx, ptrb); // load the lower 64 bits from (unaligned) ptrb 

Loading from unaltered storage (in my experience) is much slower than loading from aligned pointers, so you won’t want to perform this type of operation too often - if you really need better performance.

Hope this helps.

+3
source

Unrelated access is much slower than consistent access (at least until Nehalem); you can get a higher speed by loading aligned 128-bit words that contain the desired irregular 64-bit words, then shuffle them to make the desired result.

Assumes:

  • you have memory access for reading up to 128 words
  • 64-bit words aligned at least at 32-bit boundaries

eg. (not verified)

 int aoff = ptra & 15; int boff = ptrb & 15; __m128 va = _mm_load_ps( (char*)ptra - aoff ); __m128 vb = _mm_load_ps( (char*)ptrb - boff ); switch ( (aoff<<4) | boff ) { case 0: _mm_shuffle_ps(va,vb, ... 

The number of cases depends on whether you can perform 64 bit alignment

+2
source

Source: https://habr.com/ru/post/896055/


All Articles