(I'm only interested in components of 1st 3)
For instance:
[ 1 2 3 ? ] [ 1 2 3 ? ] should produce [ 0 0 -1 ? ] [ 0 0 -1 ? ]
In addition, it is important to have only one βbitβ so that: [ 1 2 2 ? ] [ 1 2 2 ? ] should not create [ 0 -1 -1 ? ] [ 0 -1 -1 ? ]
but most likely [ 0 -1 0 ? ] [ 0 -1 0 ? ] or [ 0 0 -1 ? ] [ 0 0 -1 ? ] (doesn't matter which one)
A later (bad) solution is possible, for example, by extracting a horizontal maximum and comparing with the original:
__m128 abcd; // input __m128 ccac = _mm_shuffle_ps(abcd, abcd, 0x8A); __m128 abcd_ccac = _mm_max_ps(abcd, ccac); __m128 babb = _mm_shuffle_ps(abcd, abcd, 0x51); __m128 abcd_ccac_babb = _mm_max_ps(abcd_ccac, babb); __m128 mask = _mm_cmpeq_ps(abcd, abcd_ccac_babb);
Perhaps some bitwise operations to get rid of duplicate set bits?
Update:
Follow-up, I made another (bad) decision.
The key is to compare each component with another, avoiding the equations of equality (not having a >= b and b >= a in another place).
a > b & a >= c
b > c & b >= a
c > a & c >= b
To obtain:
([ abc ? ] > [ bca ? ]) & ([ abc ? ] >= [ cab ? ])
and in code:
__m128 abcd; // input __m128 bcad = _mm_shuffle_ps(abcd, abcd, 0xC9); __m128 gt = _mm_cmpgt_ps(abcd, bcad); __m128 cabd = _mm_shuffle_ps(abcd, abcd, 0xD2); __m128 ge = _mm_cmpge_ps(abcd, cabd); __m128 mask = _mm_and_ps(gt, ge);
In the case of [ xxx ? ] [ xxx ? ] it fails (returns [ 0 0 0 ? ] ).
Close :-)
Any ideas?
Update:
Now I am using the following solution:
__m128 abcd; // input __m128 bcad = _mm_shuffle_ps(abcd, abcd, 0xC9); __m128 gt = _mm_cmpgt_ps(abcd, bcad); __m128 cabd = _mm_shuffle_ps(abcd, abcd, 0xD2); __m128 ge = _mm_cmpge_ps(abcd, cabd); __m128 and = _mm_and_ps(gt, ge); __m128i bits = _mm_setr_epi32(_mm_movemask_ps(and), -1, -1, -1); __m128i dirt = _mm_cmpeq_epi32(bits, _mm_setzero_si128()); __m128i mask = _mm_or_si128(dirt, _mm_castps_si128(and));