Finding the median of an unsorted array

To find the median of an unsorted array, we can do a min-heap in O (nlogn) time for n elements, and then we can extract one on one n / 2 elements to get the median. But this approach will take O (nlogn) time.

Can we do the same with some method in O (n) time? If we can, then please tell or suggest some method.

+43
heap algorithm median
May 19 '12 at 3:13
source share
6 answers

You can use the Median of Medians algorithm to find the median of an unsorted array in linear time.

+35
May 19 '12 at 3:23
source share

I already supported @ dasblinkenlight's answer, as the Median of Medians algorithm actually solves this problem in O (n) time. I just want to add that this problem can be solved O (n) times, using heaps as well. Heap building can be done O (n) times using bottom up. Take a look at the following article for a detailed explanation of Sort Heap

Suppose your array has N elements, you need to collect two heaps: MaxHeap, containing the first N / 2 elements (or (N / 2) +1, if N is odd) and MinHeap, containing the remaining elements, If N is odd, then your median is the maximum element of MaxHeap (O (1) by getting max). If N is even, then your median is (MaxHeap.max () + MinHeap.min ()) / 2, which also takes O (1). Thus, the real value of the entire operation is the heap construction operation, which is O (n).

By the way, this MaxHeap / MinHeap algorithm also works when you do not know the number of array elements in advance (if you need to solve the same problem for a stream of integers, for example, for example). For more information on how to solve this problem, see the next article. Median of Integer Streams

+15
Apr 11 '15 at 12:52
source share

Quickselect works in O (n), this is also used in the Quicksort splitting phase.

+10
May 19 '12 at 3:24
source share

The quick pick algorithm can find the kth smallest array element in linear ( O(n) ) runtime. Here is the implementation in python:

 import random def partition(L, v): smaller = [] bigger = [] for val in L: if val < v: smaller += [val] if val > v: bigger += [val] return (smaller, [v], bigger) def top_k(L, k): v = L[random.randrange(len(L))] (left, middle, right) = partition(L, v) # middle used below (in place of [v]) for clarity if len(left) == k: return left if len(left)+1 == k: return left + middle if len(left) > k: return top_k(left, k) return left + middle + top_k(right, k - len(left) - len(middle)) def median(L): n = len(L) l = top_k(L, n / 2 + 1) return max(l) 
+9
Mar 31 '14 at 0:58
source share

This can be done using the Quickselect algorithm in O (n), see K-order statistics (randomized algorithms).

0
May 19 '12 at 8:12
source share

As Wikipedia says, the median median is theoretically o (N), but it is not used in practice because the overhead of finding β€œgood” reference points makes it too slow. http://en.wikipedia.org/wiki/Selection_algorithm

Here is the Java source for the Quickselect algorithm to find the kth element in an array:

 /** * Returns position of k'th largest element of sub-list. * * @param list list to search, whose sub-list may be shuffled before * returning * @param lo first element of sub-list in list * @param hi just after last element of sub-list in list * @param k * @return position of k'th largest element of (possibly shuffled) sub-list. */ static int select(double[] list, int lo, int hi, int k) { int n = hi - lo; if (n < 2) return lo; double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot // Triage list to [<pivot][=pivot][>pivot] int nLess = 0, nSame = 0, nMore = 0; int lo3 = lo; int hi3 = hi; while (lo3 < hi3) { double e = list[lo3]; int cmp = compare(e, pivot); if (cmp < 0) { nLess++; lo3++; } else if (cmp > 0) { swap(list, lo3, --hi3); if (nSame > 0) swap(list, hi3, hi3 + nSame); nMore++; } else { nSame++; swap(list, lo3, --hi3); } } assert (nSame > 0); assert (nLess + nSame + nMore == n); assert (list[lo + nLess] == pivot); assert (list[hi - nMore - 1] == pivot); if (k >= n - nMore) return select(list, hi - nMore, hi, k - nLess - nSame); else if (k < nLess) return select(list, lo, lo + nLess, k); return lo + k; } 

I did not include the source of the comparison and swap methods, so it's easy to change the code to work with Object [] instead of double [].

In practice, you can expect the above code to be o (N).

0
Dec 31 '15 at 23:09
source share



All Articles