Median of Median Understanding

I searched the net and visited the wiki page for the median of the median algorithm. But, it seems, it cannot find an explicit statement to my question:

If you have a very large list of integers (TBs in size) and want to find the median of this list in a distributed way, it splits the list into sub-lists of different sizes (or equal to actually no matter), then proceed to calculate the medians of these smaller subscriptions, and then calculate the median of these medians as a result of the median of the original large list?

In addition, is this statement also true for any of the kth statistics? I would be interested in links to research, etc. In this region.

+4
source share
2 answers

The answer to your question is no.

If you want to understand how to actually select statistics of k-th order (including median, of course), in parallel tuning (distributed tuning, of course, is not completely different), look at this recent article in which I proposed a new algorithm that improves previous modern parallel selection algorithm:

Deterministic parallel selection algorithms on coarse-grained multi-computers

Here we use two weighted 3-medians as reference points and split them around these pentagons using a five-position split. We have also implemented and tested the algorithm using MPI. The results are very good, given that this is a deterministic algorithm using the worst-case O (n) selection algorithm. Using the O (n) QuickSelect randomized algorithm provides an extremely fast parallel algorithm.

+12
source

If you have a very large list of integers (TBs by size) and want to find the median of this list in a distributed way, it splits the list into sub-lists of different sizes (or equal is not really matter), then proceed to calculate the medians of these smaller subscriptions, and then calculate the median of these medians in the median of the original large list?

No. The actual median of the entire list is not necessarily the median of any of the subscriptions.

Median medians may give you a good point selection for quick selection because it is closer to the actual median than a randomly selected item, but you will need to run the rest of the quickselect algorithm to find the actual median a larger list.

+7
source

Source: https://habr.com/ru/post/1385820/


All Articles