How to access the components of a 256-bit ps vector

Question

How to access the components of a 256-bit ps vector

How to effectively access the elements of a 256-bit vector? For example, I calculated a point product using

c = _mm256_dp_ps(a, b, 0xff);

How to access value in c? I need to get both the high part and the low part, do I understand correctly that I first need to extract the 128-bit parts as follows:

 r0 = _mm256_extractf128_ps(c,0); r1 = _mm256_extractf128_ps(c,1);

And only then extract the float:

 _MM_EXTRACT_FLOAT(fr0, r0, 0); _MM_EXTRACT_FLOAT(fr1, r1, 0); return fr0 + fr1;

+4

c sse avx intrinsics

Nikolay Shmyrev Oct 21 '12 at 17:31

source share

2 answers

Well, you can just store in memory and then work with scalars:

 float v[8]; *(__m256)(v) = _mm256_dp_ps(a, b, 0xff); float result = v[0] + v[4];

You can also change the upper part of the lower half of the 256-bit register and add, for example:

 __m256 c = _mm256_dp_ps(a, b, 0xff); __m256 d = _mm256_permute2f128_ps(c, c, 1); __m256 result = _mm256_add_ps(c, d);

Probably much faster than any of these options is to make 4-8-point products at the same time and bring them together. Sketch:

 d0 = _mm256_dp_ps(a[0], b[0], 0xff); d1 = _mm256_dp_ps(a[1], b[1], 0xff); d2 = _mm256_dp_ps(a[2], b[2], 0xff); d3 = _mm256_dp_ps(a[3], b[3], 0xff); d01 = _mm256_permute_ps(d0, d1, ...); d23 = _mm256_permute_ps(d2, d3, ...); d0123 = _mm256_permute_ps(d01, d23, ...); d0123upper = _mm256_permute2f128_ps(d0123, d0123, 1); d = _mm256_add_ps(d0123upper, d0123); // lower 128 bits contain the results of 4 8-wide dot products

+4

Alex I Nov 30 '12 at 8:59

source share

Nikolay Shmyrev · Accepted Answer · 2012-11-30T09:36:06+0000

There is no effective way to do this. The dp_ps operation itself is slow, and subsequent extraction is slow. If you cannot process more data in the packet, it is faster to use SSE4 commands to calculate the point product and work with 128 bits than with 256 bits.

How to access the components of a 256-bit ps vector

More articles: