Find the shortest distance between two buckets of numbers

I have two buckets (disordered, one-dimensional data structures) of numbers, and I want to calculate the minimum distance between any elements of two buckets. Is there a way to find the shortest distance between any number from different buckets in O(1) ? What is my best choice?

 Input [B1] 1, 5, 2, 347, 50 [B2] 21, 17, 345 Output 2 // abs(347 - 345) 

edits

  • I expect more search than inserts
  • The distance between the smallest and largest elements in any bucket is less than 10 ^ 5
  • The number of elements in any bucket is less than 10 ^ 5
  • Numbers in buckets are sorted β€œalmost” - these are timestamps of events. Probably less than 1% of the items in buckets that are out of order.
  • The number of elements in the buckets is small, but I need to look for an average speed of 2 fps and periodically discard old buckets and replace them with new buckets, so I want my search queries to be in O(1)

See why I need this and what I thought in the previous release.

+5
source share
7 answers

Here is my attempt: sort each bucket, then collect them so that they track the minimum distance along the path: O(n+2.n/2.ln(n/2)) = O(n.ln(n)) :

 sort buk1 sort buk2 min = INT_MAX last = some value do if top(buk1) > top(buk2) min = min(min, abs(top(buk1) - last)) last = top(buk1) pop(buk1) else min = min(min, abs(top(buk2) - last)) last = top(buk2) pop(buk2) while !empty(buk1) and !empty(buk2) 
+1
source

Let there be only n numbers.
1. Write all the numbers in binary format. ==> O(n)
2. Add 0 or 1 in each number, depending on whether from B1 or B2. ==> O(n)
3. Compose them, ignoring the first bit. ==> O(n log n) on average
4. for the entire list, iterating in sorted order. For every two adjacent numbers u and v , if they come from B1 or B2, ignore.
Otherwise, set tmp <-- abs(uv) whenever tmp > abs(uv) . Thus, tmp is the minimum distance so far, in adjacent numbers.
The last tmp is the answer. ==> O(n)

total: ==> O(n log n) on average

+1
source

O (1), of course, is impossible.

Some pseudo codes that I would use as a starting point:

 sort(B1) sort(B2) i1 = 0 i2 = 0 mindist = MAX_INT // when one of the buckets is empty, we'll simply return MAX_INT. while(i1 < B1.size() && i2 < B2.size()) t = B1[i1] - B2[i2] mindist = min(mindist, abs(t)) if t > 0 i2 ++ else i1 ++ return mindist 

At least, this is O (n log n), since sorting at the beginning prevails in it. If your buckets are already sorted, you can have O (n).

Edit:

After the new information that the elements are almost sorted, I would suggest sorting them by insertion. Binary search insertion sorting is not suitable for this situation. Just add a new item and swap it forward until it fits. Usually it will not be swaps, but for 1% where you need swaps, in 99% of cases it will be only one. In the worst case, the complexity is O (n), but the average value will be almost equal to O (1).

If you think that a mindist is needed for all pairs of buckets, you will need to store i1 and i2 and a mindist . Say B1 is a bucket where you add a new item. You sort it and decrease i2 until it is 0 or B2[i2] < B1[i1] . Since items are timestamps, this will be no more than one step in most cases. Then you run the while loop again, which will usually be just one step. Thus, the computational complexity O (k) for k buckets and the memory complexity O (k ^ 2).

+1
source

Create a battle vector of 10 ^ 5 elements for each bucket. Keep track of the minimum distance (initially 10 ^ 5 until both buckets are empty).

Now, let's say you add the x element to one of the buckets. Follow these steps:

 1. Set the bit x of the same bucket. 2. Check whether the other bitvector has any set elements within min_distance-1 of x 3. Update min_distance as appropriate 

Runtime: when inserting O (min_distance), which is technically O (1), since min_distance is limited. When polling it, O (1), since you just return min_distance.

edit If the elements are not limited to 10 ^ 5, but only the distance between min and max, this will need to be changed, but it will work anyway. I can detail the necessary changes, if that matters.

+1
source

Insert your buckets into two Y-fast attempts ( https://en.wikipedia.org/wiki/Y-fast_trie ). Search for the nearest successor or predecessor O(log log M) , where M is the range (actually the max element, but we can compensate), which in your case will be closed in about four operations.

Since you will save the nearest difference, the search will be O(1) (if you do not get full buckets every time, and not constantly update), while inserting, deleting and updating for an element will be O(log log M) .

+1
source

I like Dave Galvin's idea, slightly modified:

Let maxV be the maximum number of elements maxV = max (bucket1.size, bucket2.size)

1. Build two arrays, each of which is maximum. Fill them out:

 for (j=0 to bucket1.size) array1(bucket1(j)) = bucket1(j) for (j=0 to bucket2.size) array2(bucket2(j)) = bucket1(j) 

Arrays are now sorted. The remaining elements of the array are 0.

2. Now use two iterators: one for each array:

 it1 = array1.begin it2 = array2.begin while (it1 == 0) ++it1 while (it2 == 0) ++it2 minDist = abs(it1-it2) while (it1 != array1.end && it2 != array2.end) { //advance until overpass the other while (it1 <= it2 && it1 != array1.end) ++it1 if (it1 > 0) check minDist between it1, it2 while (it2 <= it1 && it2 != array2.end) ++it2 if (it2 > 0) check minDist between it1, it2 if (it1 = it2) //well, minDist = 0 return now } 

Step 1 is O (n). Step 2 is also O (n). I don't know if this is more efficient or not than sorting buckets for large or short buckets.

0
source

Consider preliminary calculation of the answer for each number in both lists and saving them as an array. Use the index of each number in the list and use it to index into a position in an array that contains the difference.

This gives an O (1) search.

0
source

Source: https://habr.com/ru/post/1260532/


All Articles