TARGET ELEMENTS:
If your item type is an integer, then the best way is to have the bucket for each number in any of your subranges, where each bucket is used to count the number of its associated integer found in your input elements (for example, bucket[100] stores the number 100 in your input sequence). You can basically achieve this in the following steps:
- create buckets for each number in any of your subranges.
- iterate over all elements for each number
n , if we have bucket[n] , then bucket[n]++ . - calculate medians based on the aggregated values โโstored in your buckets.
Put it another way, suppose you have a subrange [0, 10] and you want to calculate the median. The bucket approach basically calculates how many 0 are present in your inputs, and how many 1 are in your inputs, and so on. Suppose that n numbers are in the range [0, 10] , then the median is n/2 th is the largest element that can be identified by finding i so that bucket[0] + bucket[1] ... + bucket[i] greater than or equal to n/2 , but bucket[0] + ... + bucket[i - 1] less than n/2 .
The best part is that even your input elements are stored on several machines (i.e. in a distributed case), each machine can support its own buckets, and only aggregated values โโshould go through the intranet.
You can also use hierarchical buckets that include several passes. In each pass, bucket[i] counts the number of elements in your input that lies in a certain range (for example, [i * 2^K, (i+1) * 2^K] ), and then narrows down the problem space, determining which bucket will be located after each step, then decrease K by 1 in the next step and repeat until you can identify the media correctly.
FLOATING ELEMENTS
All elements can fit into memory:
If all of your elements can fit into memory, the best option would be to first sort the element N, and then search for the medians for each subband. A linear time heap solution also works well in this case if the number of your subbands is less than logN .
All elements cannot be inserted into memory, but stored in one machine:
Typically, this external sorting as an example) focused on this problem.