Check if a vector contains any element greater than zero

I will be grateful if someone can help in writing a function that receives the AVX vector and check if it contains any element greater than zero.

I wrote the following code, but it is not optimal, because it stores the elements and then manipulates them. the vector must be checked as a whole.

int check(__m256 vector) { float * temp; posix_memalign ((void **) &temp, 32, 8 * sizeof(float)); _mm256_store_ps( temp, vector ); int flag=0; for(int k=0; k<8; k++) { flag= ( (temp[k]>0) ? 1 : 0 ); if (flag==1) return 1; } free( temp); return 0; } 
+5
source share
1 answer

If you are going to branch out to the result, then, as a rule, the "traditional" comparison / movemask / integer -test will be less used, as with SSE1.

 __m256 vcmp = _mm256_cmp_ps(_mm256_setzero_ps(), x, _CMP_LT_OQ); int cmp = _mm256_movemask_ps(vcmp); if (cmp) return 1; 

It usually compiles into something like

 vcmplt_oqps ymm2, ymm0, ymm1 vpmovmskb eax, ymm2 test eax,eax jnz .true_branch 

These are all instructions with the same, as well as the test / jnz macro fuse on Intel and AMD processors that support AVX, so these are just 3 users (on Intel).

See the Agner Fog instruction tables + microchip guide and other manuals related to fooobar.com/tags/x86 / ....


You can also use PTEST, but it is less effective for this case. See _mm_testc_ps and _mm_testc_pd vs _mm_testc_si128

Without AVX, ptest convenient for checking if the register is completely null, without the need for additional instructions to copy it (since it sets integer flags directly). But since it is 2 uops and cannot use a macro fuse with jcc branch jcc , this is actually worse than the above:

 // don't use, sub-optimal __m256 vcmp = _mm256_cmp_ps(_mm256_setzero_ps(), x, _CMP_LT_OQ); if (!_mm256_testz_si256(vcmp, vcmp)) { return 1; } 

The internal testz is ptest . It sets the ZF and CF flags directly based on the AND and AND NOT results of its arguments. Testz itself is true when vcmp has any nonzero bits. (which it will only be when vcmpps puts some of them.)

VPTEST with ymm regs is only available with AVX. AVX2 is not required, even if it looks like an integer vector instruction.

This will compile something like

 vcmplt_oqps ymm2, ymm0, ymm1 vptest ymm2, ymm2 jnz .true_branch 

Probably a smaller code size than higher, but it's actually 4 uops instead of 3. If you used setnz or cmovnz , macro merging would not be a factor, so ptest would be the breakeven point. As I mentioned above, the main use case for ptest is when you can use it without a comparison instruction and without AVX.

An alternative to checking the vector for all-zero ( pcmpeqb xmm0,xmm1 / pmovmskb eax, xmm1 / test eax,eax ) should destroy one of the input vectors without AVX, so copying will require the additional movdqa command if you still need both after the test.


ptest floating point bit hacks

I think that for this particular test, one could skip the comparison instruction and use VPTEST directly to see if there are any float elements with their sign bit, but some non-zero bits elsewhere.

Actually no, this idea cannot work because it does not take into account the boundaries of the elements . He could not determine the difference between a vector with a positive element and a vector with an element +0.0 (the sign of the bit is cleared), and another negative element (other bits are set).

VPTEST sets CF = bool(~src1 & src2) and ZF = (src1 & src2) . I thought src1 = set1(0x7FFFFFFF) could tell us something useful about signed bits and unsigned bits, which we could check with a condition that checks CF and ZF. For example, ja : CF = 0 and ZF = 0. Actually, there is no x86 condition, which is true only for CF = 1 and ZF = 0, so there is another problem.

Also, NaN > 0 is false, but NaN has some set bits. (exponential all-ones, the mantissa is non-zero, sign-bit = do not care, therefore it can be + NaN and -NaN). If this were the only problem, it would still be useful in cases where NaN processing is not required.

+2
source

Source: https://habr.com/ru/post/1205127/


All Articles