Nonspecific algorithm for crossing unsorted C ++

I am trying to find a way to write an efficient algorithm to perform an unsorted intersection on two vectors / arrays, but with no luck. I work with one large, frantic array (usually between 500,000 and 1,000,000 values) and one relatively smaller (possibly 5,000 max values) Unique array.

I have seen many methods suggested here, including methods like unordered_sets, but in my opinion this does not work if one of the arrays is not unique. Secondly, instead of having an output vector that contains numbers common to both arrays, I would like the output vector to contain indices of these common values โ€‹โ€‹relative to a larger array. So, if there are 5 locations in a larger array that are equal to one of the values โ€‹โ€‹in the smaller array, I need each of these 5 indexes. Maybe something similar to the python in1d function.

Does anyone have any ideas? Thanks

+4
source share
3 answers

Put the unique side in unordered_set and go through the unique side one at a time. If you find the non_unique_side[i] element in unordered_set(unique_side) , add i to the result.

Assuming unordered_set is implemented as a hash set with O(1) amortized insertion and search time, this algorithm gets you O(L+S) time complexity, where L is the number of elements in a larger list and S is the number of elements in a smaller set. It is as fast as you can make an intersection.

+3
source

Create another vector containing all the indices from a large array. Then sort the indices using a predicate that uses one level of indirection, and either do the same for a unique array or sort it in place. Then make a normal ordered intersection using a comparison that allows one level of indirection and puts the index from the mapping vector into the final result.

+1
source

You can map a large array from its value to int.

for example: unordered_map<int,int>

When matching a larger array, just increase the value for each element you find

Then you just need to switch to a lower value, and for each value check if it exists on the map. If it exists, add the number of elements in the displayed int to the result vector.

so if you have 5 gears, map [6] = 5 .. so just add 5 copies of 6 to the result of the result.

Edit:

If you need indexes, you can match the int vector and save for each value the vector of the found indices.

0
source

Source: https://habr.com/ru/post/1387266/


All Articles