When using the GCC extensions for C, how can I verify that all values ββon the vector are zero?
For instance:
#include <stdint.h> typedef uint32_t v8ui __attribute__ ((vector_size (32))); v8ui* foo(v8ui *mem) { v8ui v; for ( v = (v8ui){ 1, 1, 1, 1, 1, 1, 1, 1 }; v[0] || v[1] || v[2] || v[3] || v[4] || v[5] || v[6] || v[7]; mem++) v &= *(mem); return mem; }
SSE4.2 has a PTEST instruction that allows you to run a test similar to that used as a for condition, but the code generated by GCC just unpacks the vector and checks the individual elements one by one:
.L2: vandps (%rax), %ymm1, %ymm1 vmovdqa %xmm1, %xmm0 addq $32, %rax vmovd %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $1, %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $2, %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $3, %xmm0, %edx testl %edx, %edx jne .L2 vextractf128 $0x1, %ymm1, %xmm0 vmovd %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $1, %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $2, %xmm0, %edx testl %edx, %edx jne .L2 vpextrd $3, %xmm0, %edx testl %edx, %edx jne .L2 vzeroupper ret
Is there a way to get GCC to generate an effective test for this without returning to using the built-in functions?
Refresh . For reference, code using non-portable built-in GCC for (V)PTEST :
typedef uint32_t v8ui __attribute__ ((vector_size (32))); typedef long long int v4si __attribute__ ((vector_size (32))); const v8ui ones = { 1, 1, 1, 1, 1, 1, 1, 1 }; v8ui* foo(v8ui *mem) { v8ui v; for ( v = ones; !__builtin_ia32_ptestz256((v4si)v, (v4si)ones); mem++) v &= *(mem); return mem; }