How to choose one of n objects at random without first knowing?

For concreteness, how would you read a text line and select and print one random line if you do not know the number of lines in advance?

Yes, this is a problem from the pearl of programming, with which I am confused.

The solution selects the 1st element, then select the second with a probability of 1/2, the third with 1/3, etc.

Algorithm:

i = 0 while more input lines with probability 1.0/++i choice = this input line print choice 

Suppose the final choice is the third element, the probability is 1 x 1/2 x 1/3 x 3/4 x ... x n-2 / n-1 x n-1 / n == 1 / 2n? But 1 / n must be correct.

+6
source share
5 answers

Your algorithm is correct, but there is no analysis. You can prove it by induction. Loosely: It works for N = 1, of course. If it works up to N-1, then what happens at N? The chance that the Nth element is selected and overwrites the last selection is - 1 / N - good. The probability that this is not so (N-1) / N. In this case, the selection of the previous step is used. But at that moment all the elements had a 1 / (N-1) chance of being selected. Now it's 1 / N. Good.

+5
source

Your calculation is incorrect:

Suppose the final choice is the third element, the probability is 1 x 1/2 x 1/3 x 3/4 x ... x n-2 / n-1 x n-1 / n

Real probability:

(1 x 1/2 + 1 x 1/2) x 1/3 x 3/4 x ... x n-2 / n-1 x n-1 / n == 1 / n

since either you chose 2 or you did not choose 2 (for choice 2 it has a sample of 1/2, not 2)

+1
source

Method 1. Take the first pass to determine how many lines there are. Then randomize it.

Method 2: Use the method you mentioned. It works, but it does not give every line an equal chance of choice.

As far as I can tell, it’s impossible to configure method 2 to give each line the same chance of choice. It is mathematically possible that the number of lines will be unlimited indefinitely.

0
source

Read 1 Read 2 50% chance to either save or drop. Read 3 (we should have 1 or 3 or 2 and 3). 50% chance of any of the lines, refuse another. Continue to work with a 50% chance throughout the entire file, this leaves you with 2 lines. Take 50/50 on any of the lines, and you have a random string. The odds were even for the whole file.

0
source

This is no coincidence - since you most likely selected the line at the beginning of the file. You need to know the number of rows to make it random. (50% of the time you get in the first line!)

-1
source

Source: https://habr.com/ru/post/896382/


All Articles