AVX2 float compares and gets 0.0 or 1.0 instead of the all-0 or all-one bits

Basically, in the resulting vector I want to save 1.0 for all floating point values ​​of input> 1, and 0.0 for all floating point values ​​of input = lt. = 1. Here is my code,

float f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 0.7}; float r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0} __m256i tmp1 = _mm256_cvttps_epi32(_mm256_loadu_ps(f)); __m256i tmp2 = _mm256_cmpgt_epi32(tmp1, _mm256_set1_epi32(1)); _mm256_store_ps(r, _mm256_cvtepi32_ps(tmp2)); for(int i = 0; i < 8; i++) std::cout << f[i] << " : " << r[i] << std::endl; 

But I do not get the right results. This is what I get. Why don't AVX2 relational operations work for me?

 1.2 : 0 0.5 : 0 1.7 : 0 1.9 : 0 0.34 : 0 22.9 : -1 18.6 : -1 0.7 : 0 
+5
source share
2 answers

I think it is better to use _mm256_cmp_ps for your question. For this purpose I completed the following program. This is more than you want. If you want to save them, you must set all mask elements to 1 , but if you want to save a different number, you can change the value of the mask to whatever you want.

 //gcc 6.2, Linux-mint, Skylake #include <stdio.h> #include <x86intrin.h> float __attribute__(( aligned(32))) f[8] = {1.2, 0.5, 1.7, 1.9, 0.34, 22.9, 18.6, 1.0}; // float __attribute__(( aligned(32))) r[8]; // Must be {1, 0, 1, 1, 0, 1, 1, 0} // in C++11, use alignas(32). Or C11 _Alignas(32), instead of GNU C __attribute__. void printVecps(__m256 vec) { float tempps[8]; _mm256_store_ps(&tempps[0], vec); printf(" [0]=%3.2f, [1]=%3.2f, [2]=%3.2f, [3]=%3.2f, [4]=%3.2f, [5]=%3.2f, [6]=%3.2f, [7]=%3.2f \n", tempps[0],tempps[1],tempps[2],tempps[3],tempps[4],tempps[5],tempps[6],tempps[7]) ; } int main() { __m256 mask = _mm256_set1_ps(1.0), vec1, vec2, vec3; vec1 = _mm256_load_ps(&f[0]); printf("vec1 : ");printVecps(vec1); // load vector values from f[0]-f[7] vec2 = _mm256_cmp_ps ( mask, vec1, _CMP_LT_OS /*0x1*/); printf("vec2 : ");printVecps(vec2); // compare them to mask (less) vec3 = _mm256_min_ps (vec2 , mask); printf("vec3 : ");printVecps(vec3); // select minimum from mask and compared results return 0; } 

The output for mask = {1,1,1,1,1,1,1,1} :

 vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00 vec2 : [0]=-nan, [1]=0.00, [2]=-nan, [3]=-nan, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00 vec3 : [0]=1.00, [1]=0.00, [2]=1.00, [3]=1.00, [4]=0.00, [5]=1.00, [6]=1.00, [7]=0.00 

And for mask = {2,2,2,2,2,2,2,2} :

 vec1 : [0]=1.20, [1]=0.50, [2]=1.70, [3]=1.90, [4]=0.34, [5]=22.90, [6]=18.60, [7]=1.00 vec2 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=-nan, [6]=-nan, [7]=0.00 vec3 : [0]=0.00, [1]=0.00, [2]=0.00, [3]=0.00, [4]=0.00, [5]=2.00, [6]=2.00, [7]=0.00 

It depends on the non-commutative behavior of _mm256_min_ps with NaN to replace NaN elements with 1.0. NaN > 1.0 : NaN : 1.0 = 1.0 , because NaN > anything always false.

Beware that gcc before 7.0 treats the built-in 128b _mm_min_ps as commutative, even without -ffast-math (although it knows that the minps instruction minps not), use the latest version of gcc or make sure gcc chooses to compile your code using operands in order, necessary for this algorithm. (Or use clang). It is possible that gcc will never replace operands with AVX, only with SSE (to avoid additional movapd instructions), but the safest thing is to use gcc7 or later.

+5
source

When a float is converted to int using _ mm256_cvttps_epi32 , the returned integer is a truncated (rounded to zero) value. That is, the values ​​1.2, 1.7 and 1.9 are converted to 1, and therefore they do not exceed 1.

The output file _mm256_cmpgt_epi32 is not 1, but "all those", from docs :

... if the data element s1 is larger than the corresponding element in s2, then the corresponding element in the target vector is set to all 1s.

"All those" is the use of two integers, as your results show, minus one.

Disable topic:

  • Why are you using unbalanced load and leveled storage?
  • You should take a look at _mm256_cmp_ps
+3
source

Source: https://habr.com/ru/post/1267321/


All Articles