Saving Individual Doubles from Packed Double Vector Using Intel AVX

I am writing code using the built-in C functions for Intel AVX instructions. If I have a packed double vector (a __m256d ), which would be the most efficient way (i.e. the least number of operations) for storing each of them in a different place in memory (i.e. I need to deploy them to different ones where they are no longer packed)? Pseudocode:

 __m256d *src; double *dst; int dst_dist; dst[0] = src[0]; dst[dst_dist] = src[1]; dst[2 * dst_dist] = src[2]; dst[3 * dst_dist] = src[3]; 

Using SSE, I could do this with the __m128 types, using the built-in functions _mm_storel_pi and _mm_storeh_pi . I could not find anything similar for AVX, which allows me to store individual 64-bit fragments in memory. Does it exist?

+4
source share
1 answer

You can do this with a few extractive extracts: (warning: unverified)

  __m256d src = ... // data __m128d a = _mm256_extractf128_pd(src, 0); __m128d b = _mm256_extractf128_pd(src, 1); _mm_storel_pd(dst + 0*dst_dist, a); _mm_storeh_pd(dst + 1*dst_dist, a); _mm_storel_pd(dst + 2*dst_dist, b); _mm_storeh_pd(dst + 3*dst_dist, b); 

What you want is the collection / scatter commands in AVX2 ... But this is a few years later.

+6
source

Source: https://habr.com/ru/post/1385464/


All Articles