Finding the kth smallest element in a union of sorted arrays

Question

Finding the kth smallest element in a union of sorted arrays

I studied an article about finding the kth smallest element in combining two sorted arrays in leetcode . I do not think the algorithm is correct. There is such a line: We notice that when Ai <Bj, then it must be true that Ai <Bj-1. On the other hand, if Bj <Ai, then Bj <Ai-1. . How can this be true for any i and j ?

And secondly, this line also puzzles me: We are trying to approach this difficult problem by comparing the middle elements A and B, which we identify as Ai and Bj. If Ai is between Bj and Bj-1, we just found i + j + 1 the smallest element , although this is true. Can anyone explain the reason? I really want to understand algo, I did this by combining arrays, but it takes O(N) compared to O(log N) here.

+4

python algorithm

SexyBeast Sep 23 '12 at 20:00

source share

2 answers

You interpret these statements in isolation, but they are built on top of each other. Here is the text that (I think) you have in mind:

Maintaining the invariant i + j = k - 1, If Bj-1 <Ai <Bj, then Ai should be the kth smallest, or if Ai-1 <Bj <Ai, then Bj should be the kth smallest. If one of the above conditions is met, we are done. If not, we will use i and j as a composite index to separate arrays. But how? Which part should we drop? How about Ai and Bj themselves?
We remark that when Ai <Bj, then it must be true that Ai <Bj-1. On the other hand, if Bj <Ai, then Bj <Ai-1. Why?

Interrupting this in the subheadings gives the following interpretation (bearing in mind that indexing starts at 0 , but A0 is the first smallest element, and A1 is the second smallest element, etc.):

i + j = k - 1 (invariant, by definition)
Put Bj-1 < Ai < Bj . Then Ai should be k th smallest. This is due to the fact that Ai greater than i elements in A and more than j elements in B Thus, this is more than just i + j = k - 1 elements. This means that its index in the combined list A|B will be k - 1 , and therefore it will be the k element in this list.
Put Ai-1 < Bj < Ai . Then Bj should be k smallest, along the same line of reasoning in 2.
Now suppose that both (a) Bj-1 < Ai < Bj and (b) Ai-1 < Bj < Ai are false. Then it is obvious that if Ai < Bj then A1 < Bj-1 , since otherwise (a) would be true. Similarly, if Bj < Ai then Bj < Ai-1 , since otherwise (b) will be true.

I take your word that you want to explain these statements, and not the algorithm as a whole. (But I will say more if you want.)

Please also note that since Daniel Fisher answers me, the above reasoning is only valid if there are no duplicates; call this sentence 0.

+7

senderle Sep 23 '12 at 20:24

source share

Daniel Fischer · Accepted Answer · 2012-09-23T20:26:51+0000

We remark that for Ai < Bj , it must be true that Ai < Bj-1 . On the other hand, if Bj < Ai , then Bj < Ai-1 .. How can this be true for any i and j ?

This is not true for all pairs i and j . The article deals with a special situation.

Firstly, it is assumed that there are no duplicates, even in the form of common elements A and B Secondly, the conclusion that

 Ai < Bj ==> Ai < Bj-1, resp. Bj < Ai ==> Bj < Ai-1

provided that neither

 Bj-1 < Ai < Bj resp. Ai-1 < Bj < Ai

takes place. So, excluding these configurations, Ai < Bj ==> Ai <= Bj-1 and Bj < Ai ==> Bj <= Ai-1 follow immediately, and strict inequalities follow from the assumption that there are no duplicates.

We are trying to approach this complex problem by comparing the middle elements A and B, which we identify as Ai and Bj. If Ai is between Bj and Bj-1, we just found i + j + 1 smallest element

In array B there are j elements less than Bj , and in array A there are i elements less than Ai (indices start at 0). Therefore, if Bj-1 < Ai < Bj , both arrays together contain exactly j + i elements smaller than Ai .

What will change if there are duplicates?

Little.

We are still considering a situation where i + j = k-1 . Suppose that Ai <= Bj .

What if Ai = Bj ?
What if Ai < Bj ?

In case 1. let m be the smallest index such that Am = Ai and n smallest index such that Bn = Bj . Then in both arrays together there are exactly m + n <= i + j = k-1 elements strictly smaller than Ai , and at least (i+1) + (j+1) = (k+1) elements, not exceeding Ai . Therefore, the kth smallest element is equal to Ai .

For 2. we consider three cases: a) Bj-1 < Ai , b) Bj-1 = Ai , c) Bj-1 > Ai .

In case a) we have j elements in B that are no more than Ai , and all of them are strictly less, and elements m <= i in A that are strictly less than Ai ( m , as indicated above) and an odd number, but not less than i-m+1 elements equal to Ai . Thus, in both arrays exactly j + m <= j + i = k-1 elements that are strictly less than Ai and at least j + m + (i-m+1) = j+i+1 = k elements exceeding Ai ; therefore, the kth smallest element of both arrays together is equal to Ai .

In case b), the same argument shows that the kth smallest element of both arrays together is equal to Ai .

In the remaining case, Ai < Bj-1 situation becomes a little more complicated. Array B contains at least j elements no more than Bj-1 , and array A contains at least i+1 elements strictly smaller than Bj-1 , so the kth smallest element of both arrays together is in most of Bj-1 . But it cannot be less than Ai ( B contains no more than j-1 elements less than Ai , therefore both arrays together contain no more than i + (j-1) = k-2 elements less than Ai ).

Thus, we can discard the part below Ai from array A and the part above Bj-1 from array B and act as without duplicates.

All that changed was that several strict inequalities had to be replaced by weak inequalities.

Code (would be more efficient if starting indexes and lengths were passed instead of slicing, but slicing gives a shorter code):

 def kthsmallest(A, B, k): if k < 1: return None a_len, b_len = len(A), len(B) if a_len == 0: return B[k-1] # let it die if B is too short, I don't care if b_len == 0: return A[k-1] # see above # Handle edge case: if k == a_len + b_len, we would # get an out-of-bounds index, since i + j <= a_len+b_len - 2 # for valid indices i and j if a_len + b_len == k: if A[-1] < B[-1]: return B[-1] else: return A[-1] # Find indices i and j approximately proportional to len(A)/len(B) i = (a_len*(k-1)) // (a_len+b_len) j = k-1-i # Make sure the indices are valid, in unfortunate cases, # j could be set to b_len by the above if j >= b_len: j = b_len-1 i = k-1-j if A[i] <= B[j]: if j == 0 or B[j-1] <= A[i]: return A[i] # A[i] < B[j-1] <= B[j] return kthsmallest(A[i:], B[:j], ki) # B[j] < A[i], symmetrical to A[i] < B[j] if i == 0 or A[i-1] <= B[j]: return B[j] # B[j] < A[i-1] return kthsmallest(A[:i], B[j:], kj)

Finding the kth smallest element in a union of sorted arrays

What will change if there are duplicates?

More articles: