How to find the median of numbers in linear time using heaps?

Wikipedia says:

Selection algorithms: search for min, max, both min and max, median , or even the k-th largest element can be performed in linear time using heaps.

All he says is that it can be done, not how.

Can you talk a little about how this can be done with heaps?

+49
heap algorithm time-complexity median
Apr 05 '10 at
source share
7 answers

You would use the min-max-median heap to find min, max and median in constant time (and take linear time to create the heap). You can use order statistics trees to find the kth smallest / largest value. Both of these data structures are described in this article on the min-max heap [pdf link] . Min-max heaps are binary heaps that alternate between minimum heaps and maximum heaps.

From the article: Min-max-median heap is a binary heap with the following properties:

1) The median of all elements is at the root

2) The left subtree of the root is a bunch of min-max Hl-sized ceilings [((n-1) / 2)] containing elements that are less than or equal to the median. The right subtree is a heap of max-min Hr of size floor [((n-1) / 2)] containing only elements greater than or equal to the median.

The rest of the article explains how to build such a bunch.

Editing. When reading paper in more detail, it seems that building min-max-median heaps requires that you first find the median (FTA: "Find the median of all n elements using any known linear time algorithm"). However, once you have built the heap, you can maintain the median simply by maintaining a balance between the min-max heap on the left and the max-min heap on the right. DeleteMedian replaces the root with either the minimum heap max-min or the max heap min-max (depending on which balance is retained).

So, if you plan to use the min-max-median heap to find the median of a fixed dataset, then you are SOL, but if you use it in a changing dataset, it is possible.

+21
Apr 09 '10 at 23:34 on
source share

See this wikipedia page for selection algorithms . In particular, consider the BFPRT algorithm and the Median of Medians algorithm. BFPRT is probabilistically linear and is modeled on quicksort; The median of the medians is guaranteed to be linear, but has a large constant coefficient, so in practice it may take longer, depending on the size of your data set.

If you have only a few hundred or thousands of elements from which you can choose the median, I suspect that simple quick sorting, followed by direct indexing, is easiest.

+4
Apr 05 '10 at 18:08
source share

There are probably better algorithms, but here's how I do it:

Have two buckets and a value. The value is median, two buckets β€œmore than median” and β€œless than median”. For each element x in the array, balancing buckets such that big_bucket and small_bucket differ by no more than 1 in size. When moving items from a large bucket to a small bucket, they must first go through the median value to get there (that is, a difference of 2 will successfully remove an item from one bucket to another - a difference of 1 will push the item from one bucket to the median value.) At the end of your first pass through the array, the value should be your median.

+4
Apr 05 '10 at 18:08
source share

perhaps this was not the case when the original question was asked, but now the wiki has a link to the source, and here it is: http://ftp.cs.purdue.edu/research/technical_reports/1991/TR%2091-027. pdf

go to page 17 and see the RSEL4 description. They prove in Theorem 3.2 that the time complexity of this kth selection algorithm is O (k). so you need O (n) to create the heap and extra O (k) to find the kth smallest element.

it's not as straight forward as some other answers suggested

+3
Feb 15 2018-12-12T00:
source share

If you know more about the heap data structure, you will easily realize that this is true. the heap structure can be built in O (n) time, there are a bunch of minutes and a maximum heap. min heap root will give you the smallest element. max heap root element will give you the maximum element. Just by creating a bunch, you will find min and max. the same idea for the median and kth largest, when building your heap, you can find the median and kth largest in size by looking at the left or right branch of the tree and keeping a constant amount of memory to store the item number. and etc.

0
Apr 08 '10 at 3:32
source share

Store the first integer in the array and set the counter to 1. Then draw the remaining integers in the vector. If the current integer in the array matches the one that is stored, the counter is incremented by one, otherwise the counter is decremented by one. If the counter ever reaches zero, discard the stored integer and replace it with the current integer in the array. When you finally go through all the integers, you will be left with one candidate. Then you need to loop through the array again and calculate the likelihood of a candidate appearing to make sure that it is truly a dominant.

 static int FindDominator(int[] arr) { int counter = 1; int candidate = arr[0]; for(int i = 1; i < n; i++) { if(arr[i] == candidate) counter++ else { counter--; if(counter == 0) { candidate = arr[i]; counter = 1; } } } counter = 0; for(int i = 0; i < n; i++) { if(arr[i] == candidate) counter++; } if(counter > n / 2) return candidate; else return -1; } 
0
Mar 26 '15 at 17:34
source share

Obviously, min and max in O (n) are easy and do not require a heap.

The K'-largest can be done quite simply by maintaining a k-sized bunch of the top k values ​​so far. Runtime will be O (n * logk). You can call this linear time if k is a fixed size and k <n.

I do not think median is possible. It takes O (n * logn) time to create a heap of size O (n).

Edit: Well, thinking about this a bit more, IVlad is right. You can create a bunch in O (n) for a fixed size. But ... this does not help the OP with its median question. The linear heap creation method creates a real heap as the end result. A simple approach to doing n inserts, resulting in a real heap after each O (n * logn) step.

It seems to me that using heaps to find the median will require using those who work with heaps. For example, a response was posted (which now seems deleted) related to a blog post suggesting an algorithm for this problem. He tracked the current median using two heaps (the smaller half and the larger half) as he performs one data pass. This will require a slower, naive approach to the heap because it depends on keeping the actual heaps as it inserts and removes from them.

Is there any other way to find the median using the one-shot heap method using a single frame?

-one
Apr 05 '10 at 18:12
source share



All Articles