Find the median of the sum of the arrays

Given two sorted arrays of length n, the question is to find in O (n) time the median of their sum array, which contains all possible pairwise sums between each element of array A and each element of array B.

For example: let A [2,4,6] and B [1,3,5] be two given arrays. The total array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5] . Find the median of this array in O (n).

The solution to the question in O (n ^ 2) is pretty straight forward, but is there any solution O (n) to this problem?

Note. This is an interview question asked by one of my friends, and the interviewer was absolutely sure that it could be resolved in O (n) time.

+45
arrays algorithm median
Jun 26 '13 at 9:51 on
source share
4 answers

The correct solution to O (n) is quite complex and requires a significant amount of text, code and skills to explain and prove. More precisely, it does 3 pages to make it convincing, as can be seen in more detail here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).

This is basically a smart separation and rest algorithm, which, among other things, uses the fact that in a sorted n-on-n-matrix you can find in O(n) number of smaller elements / more than a given number k . It recursively breaks the matrix into smaller submatrices (taking only odd rows and columns, which leads to a submatrix having n/2 colums and n/2 rows), which, combined with the above step, leads to complexity O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n) , This is crazy!

I cannot explain it better than paper , so I will explain the simpler solution O(n logn) :) .




O (n * logn):

This is an interview! You cannot get this O(n) solution on time. So, why not provide a solution that, although not optimal, shows what you can do better than the other obvious O(n²) candidates?

I will use the O(n) algorithm described above to find the number of numbers that are less than / greater than a given number k in the sorted n-by-n matrix. Keep in mind that we do not need a real matrix! The Cartesian sum of two arrays of size n , as described by OP, leads to a sorted matrix n-by-n , which we can model by considering the elements of the array as follows:

 a[3] = {1, 5, 9}; b[3] = {4, 6, 8}; //a + b: {1+4, 1+6, 1+8, 5+4, 5+6, 5+8, 9+4, 9+6, 9+8} 

Thus, each row contains non-decreasing numbers, as well as each column. Now pretend you're assigned the number k . We want to find in O(n) how many numbers in this matrix are less than k , and how many are more. Clearly, if both values ​​are less than (n²+1)/2 , this means that k is our median!

The algorithm is pretty simple:

 int smaller_than_k(int k){ int x = 0, j = n-1; for(int i = 0; i < n; ++i){ while(j >= 0 && k <= a[i]+b[j]){ --j; } x += j+1; } return x; } 

This basically counts how many elements match the condition in each row. Since the rows and columns are already sorted as shown above, this will give the correct result. And since both i and j repeat no more than n times each, the O(n) algorithm [Note that j does not get reset in the for loop]. The greater_than_k algorithm greater_than_k similar.

Now, how do we choose k ? This is part of logn . Binary Search! As mentioned in other answers / comments, the median should be the value contained in this array:

candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]}; .

Just sort this array [also O(n*logn) ] and run a binary search on it. Since the array is now in non-decreasing order, he directly noticed that the number of numbers smaller than each candidate[i] is also a non-decreasing value (monotone function), which makes it suitable for binary search. The largest number k = candidate[i] , the result of which smaller_than_k(k) returns less than (n²+1)/2 , is the answer and obtained in log(n) iterations:

 int b_search(){ int lo = 0, hi = n, mid, n2 = (n²+1)/2; while(hi-lo > 1){ mid = (hi+lo)/2; if(smaller_than_k(candidate[mid]) < n2) lo = mid; else hi = mid; } return candidate[lo]; // the median } 
+14
Jun 27 '13 at 2:05
source share

Let them say that the arrays A = {A[1] ... A[n]} and B = {B[1] ... B[n]} , and the pair array of sums is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} , which has elements n^2 , and we need to find its median.

Median C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]} : if you correct A[i] and consider all the sums A[i] + B[j] , you will see that only A[i] + B[j = n + 1 - i] (which is one of D ) can be median. That is, it may not be a median, but if it is not, then all the other A[i] + B[j] also not median.

This can be proved by looking at all B[j] and counting the number lower and the number of values greater than A[i] + B[j] (we can do this fairly accurately because the two arrays are sorted - calculation is a bit of a dirty idea). You will see that for A[i] + B[n + 1 - j] these two accounts are most “balanced”.

Then the problem boils down to finding the median D , which has only elements n . The Hoare algorithm will work.

UPDATE : this answer is incorrect. The real conclusion here is that the median is one of the elements of D , but then the D median is not the same as the C median.

+1
Jun 27 '13 at 0:52
source share

Doesn't that work ?:

You can calculate the rank of a number in linear time while A and B are sorted. The technique used to calculate the rank can also be used to find all things in A+B that are between some lower bound and some upper bound in time, the linear size of the output plus |A|+|B| .

Randomly select n things from A+B Take the median, say foo . Calculate the rank of foo . With constant probability, the rank foo is within n median rank. Continue to do this (expected constant number of times) until you get the lower and upper borders of the median within 2n each other. (This whole process requires the expected linear time, but it is clearly slowing down.)

All you have to do is enumerate everything between the boundaries and choose linear time in linear size.

(It makes no difference, I would not apologize to the interviewer for asking such a question with a shitty interview. Such things in no way indicate your ability to code.)

EDIT . You can calculate the rank of x by doing something like this:

 Set i = j = 0. While j < |B| and A[i] + B[j] <= x, j++. While i < |A| { While A[i] + B[j] > x and j >= 0, j--. If j < 0, break. rank += j+1. i++. } 

FURTHER IMAGE . Actually the aforementioned trick only narrows the space of candidates to approximately n log (n) members A+B Then you have a general problem choosing in a universe of size n log (n); you can do the same trick again and find a range of sizes proportional to sqrt (n) log (n) where you make the selection.

Here's why: If you select k things from an n-set and accept the median, then the median median order is between (1/2 - sqrt (log (n) / k)) th and (1/2 + sqrt (log (n) / k)) th elements with at least a constant probability. When n = | A + B |, we want to take k = sqrt (n), and we get a range near the sqrt (n log n) elements --- about | A | log | A |. But then you do it again, and you get a range of order srp (n) polylog (n).

0
Jun 27 '13 at 1:23
source share

You must use a selection algorithm to find the median of an unsorted list in O (n). Take a look at this: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

0
Jul 02 '13 at 21:45
source share



All Articles