Calculating the size of the symmetric difference of two sorted arrays using SIMD AVX

I am looking for a way to optimize the algorithm I'm working on. This is the most repeated and, therefore, computationally intensive part - this is a comparison of two sorted arrays of any size containing unique unsigned integer ( uint32_t) values to get the size of their symmetric difference (the number of elements that exist in only one of the vectors). The target machine on which the algorithm will be deployed uses Intel processors that support AVX2, so I'm looking for a way to execute it in place using SIMD.

Is there a way to use the AVX2 instructions to get the size of the symmetric difference of two sorted arrays of unsigned integers?

+4
source share
2 answers

Since both arrays are sorted, it is quite simple to implement this algorithm using SIMD (AVX2). You just need to go through two arrays at the same time, and then when you get a mismatch when comparing two vectors of 8 ints, you will need to resolve the mismatch, i.e., count the differences, and return the indices of the two arrays in phase and continue until you get to end of one of the arrays. Then just add no from the remaining elements in another array, if any, to get the final score.

+3
source

(, <= 16 ), .

, , PaulR. , (, 10% ), . .

:

int Merge3(const int *aArr, int aCnt, const int *bArr, int bCnt, int *dst) {
    int i = 0, j = 0, k = 0;
    while (i < aCnt - 32 && j < bCnt - 32) {
        for (int t = 0; t < 32; t++) {
            int aX = aArr[i], bX = bArr[j];
            dst[k] = (aX < bX ? aX : bX);
            k += (aX != bX);
            i += (aX <= bX);
            j += (aX >= bX);
        }
    }
    while (i < aCnt && j < bCnt) {
       ... //use simple code to merge tails

:

  • (32 ). (i < aCnt && j < bCnt) t < 32. , .
  • . , cmov, setXX, . : , .

:

  • ( ) (4 + 4) , , , 4- (): 4.95ns 4.65ns --- .
  • ( ) 4 x 4 , 16- , -, _mm256_permutevar8x32_epi32 128- LUT, 8 , , _mm_movemask_ps + 16-entry LUT + _mm_shuffle_epi8 4 : 4.00ns 4.65ns --- .

+ LUT.

P.S. , . , 2 .

+1

Source: https://habr.com/ru/post/1674191/


All Articles