Efficient algorithm for creating n-way intersection of sorted arrays in C

I need to create an intersection between some sorted arrays of integers in C. I know how to find the intersection between two sorted arrays, but I need to do this for more than two arrays, efficiently and without prior knowledge of the number of arrays. I can put a reasonable limit on the maximum number - say, ten at the moment. These arrays can be from several elements to several hundred thousand objects long and not necessarily the same length.

Pseudocode for creating the intersection of two sorted arrays:

while i < m and j < n do: if array1[i] < array2[j]: increment i else if array1[i] > array2[j]: increment j else add array1[i] to intersection(array1, array2) increment i increment j 

I work with C and I get a clear explanation, not code.

+6
source share
6 answers

I assume all your arrays are sorted. Suppose we have arrays A_1 to A_n . Have a counter for each array (so we have n counters i_1 to i_n , just like you did for two arrays).

Now we introduce the minimum heap that contains all the arrays so that the minimum array is the array with the smallest number that the corresponding pointer points to. This means that at every moment we can get the array with the lowest number that it points to.

Now we extract the minimum array from the heap and store it. We continue to extract the minimum array until the specified number remains unchanged. If we retrieve all arrays (i.e. if all arrays have the same low exponent indicating a number), we know that this number is at the intersection. If not (i.e., if not all arrays contain the same low exponent indicating a number), we know that the number we are currently considering cannot be at the intersection. Thus, we increase all the counters to the already extracted arrays and put them back into the heap.

We do this until we find one pointer to the array that reaches the end of the array. I apologize for the detailed description, but I do not have enough time to understand it in more detail.

Change

If you have one array with a very small number of elements, it may be useful to simply perform a binary search for other arrays for these numbers or check these numbers with a hash table.

+2
source

You already have the logic for intersecting on two arrays, so just call it several times.

Your intersection logic

 while i < m and j < n do: if array1[i] < array1[j]: increment i else if array1[i] > array2[j]: increment j else add array1[i] to intersection(array1, array2) increment i increment j 

Encapsulate the above code in Intersect(int[] array1, int[] array2, int array1Length, int array2Length) , which returns int[] . Call the method again by the result.

  • Call1: int[] result = Intersect(array1, array2, array1Length, array2Length)
  • Call2: result = Intersect(result, array3, resultArrayLength, array3Length)
  • ...
  • Call (n-1): result = Intersect(result, arrayn, resultArrayLength, arraynLength)

Possible optimization:

  • Continue calls only if resultArrayLength > 0 (otherwise interest is a null set).
  • In the intersection method, compare the last element of array1 with the first element of array2, i.e.

EDITED if (array1[array1Length - 1] < array2[0]) return an empty set (assuming the arrays are sorted).

+4
source

First I intersect the two smallest arrays, and then intersect the result with the smallest intersected array on the left. This ensures that at each separate intersection, one of the two arrays in question is no larger than the smallest source array, which should save some time.

+2
source

Use the mergesort method.

Let arrays

1 2 3 4 5 6 7.

First find the intrseection of 1 2, 2 4, 5 6, for 7 donot do nothing. New sets: A (1-2 intersections) B (3-4 intersections) C (5-6 intersections) 7. Repeat above until you get one set.

+1
source

You can probably parallelize the calculation, since the intersection operation is commutative and associative. Each thread calculates the intersection of two arrays, which will reduce the number of arrays by two steps.

+1
source

If there are no restrictions on spatial complexity, then this will be easy to achieve with hashmap. Assuming the numbers are not repeated in the same array.

Here is the python code for the same:

 def intersection(arr_list): """ >>> intersection([[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]) [3, 4] >>> intersection([[1, 2, 3, 4], [2, 3, 4, 5], [5, 6]]) [] """ n = len(arr_list) ret = [] d = {} for i in arr_list[0]: d[i] = 1 for arr in arr_list[1:]: for j in arr: if j in d: d[j] += 1 for k, v in d.items(): if v == n: ret.append(k) return ret if __name__ == "__main__": import doctest doctest.testmod(verbose=True) 
0
source

Source: https://habr.com/ru/post/885701/


All Articles