The choice of numbers from the set with equal probability

Got this question from Stephen Skien's algorithm design guide.

It is required to choose the values ​​of k (value) to form a subset S 'from a given set S with n numbers, so the probability of selection for each number is (k / n). n is unknown (I was thinking about taking S as a reference list for this). also we can only go through the set S.

+4
source share
3 answers

Something like that

for elem in S if random() < (k - S'.size)/S.size // This is float division S'.add(elem) 

The first element is selected with probability k/n , the second with (nk)/n * k/(n-1) + k/n * (k-1)/(n-1) , which reduces to k/n , etc. d.

+2
source

When n is unknown , you probably need an online algorithm for the so-called reservoir sampling .

Here are some good explanations and proofs of the sketches http://propersubset.com/2010/04/choosing-random-elements.html

I mean this algorithm implemented in Python (taken from the link above)

 import random def random_subset( iterator, K ): result = [] N = 0 for item in iterator: N += 1 if len( result ) < K: result.append( item ) else: s = int(random.random() * N) if s < K: result[ s ] = item return result 
+6
source

You have to choose an algorithm that can really simulate the actual "Randomly select k numbers from n numbers" activity. Your algorithm must have two properties

(1) He must return k numbers at the end.

(2) He should really imitate these properties of the target activity: each number is selected with probability k / n.

The rim s answer is wrong because it hasn t the first property.

 for i = 0 to n randomly choose an integer number between [1,n-i+1] if [randomValue <= (k - S'.size)/(S.size - i + 1)] then S'.add(S[i]) 

Using the above plan, each number is selected with probability k / n. You can see that with the proof below the equation:

https://www.facebook.com/photo.php?fbid=677984275648191&l=7cafe5d468

+1
source

Source: https://habr.com/ru/post/1386465/


All Articles