How to find the median of a large number of integers (they do not fit into memory)

I know the answer uses the median of the medians, but can someone explain how to do this?

+4
source share
3 answers

There are linear time algorithms for this, this page may be useful, http://en.wikipedia.org/wiki/Selection_algorithm , if you are still confused, just ask

Basically, how the selection algorithm works, it is similar to quicksort, but each time it is sorted only on the side of the rotation. The goal is to maintain the separation until you select a bar equal to the index of the element you were trying to find. Here is the java code I found for quickselect:

public static int selectKth(int[] arr, int k) { if (arr == null || arr.length <= k) throw new Error(); int from = 0, to = arr.length - 1; // if from == to we reached the kth element while (from < to) { int r = from, w = to; int mid = arr[(r + w) / 2]; // stop if the reader and writer meets while (r < w) { if (arr[r] >= mid) { // put the large values at the end int tmp = arr[w]; arr[w] = arr[r]; arr[r] = tmp; w--; } else { // the value is smaller than the pivot, skip r++; } } // if we stepped up (r++) we need to step one down if (arr[r] > mid) r--; // the r pointer is on the end of the first k elements if (k <= r) { to = r; } else { from = r + 1; } } return arr[k]; } 
+1
source

See the first two answers to this question . If the first (frequency counter) can work for your data / available storage, you can get an exact answer this way. The second (fixer) is a reliable general method.

0
source

Source: https://habr.com/ru/post/1479161/


All Articles