Random number for choosing winners based on probability

Imagine that you have an array of hashes representing a competitor and their probability of winning a prize (float from 0 to 1). How:

[ {:name => "Adam" , :prob => 0.5} {:name => "Ben" , :prob => 1.0} {:name => "Chris" , :prob => 0.1} {:name => "Daniel" , :prob => 0.2} {:name => "Ed" , :prob => 0.7} {:name => "Frey" , :prob => 0.5} {:name => "Gilbert" , :prob => 0.3} ] 

I would like to have an algorithm on which I can select three winners using random numbers, but respecting the probability of each person.

The overall probability of sampling is 3.3

A logical approach would be to calculate a random value like:

 val = rand(33)/10.0 

And scan the array until I get a person who reaches a random number.

This approach works, but it involves scanning in an array.

I wonder if there will be a simpler solution. Any ideas?

PS: Imagine that an array can contain a large number of elements.

+3
generics random
Jan 31 2018-12-12T00:
source share
4 answers

I thought about this, and I think my result makes sense:

  • sort the vector by probability: [a = 0.1, b = 0.2, c = 0.3, d = 0.4]
  • select a random number (e.g. 0.5)
  • iterate from the beginning and sum the probability values ​​and stop when it is higher: answer = 0.1 + 0.2 + 0.3. So, 0.6> 0.5, we evaluate 'c'

my point is that the values ​​at the end of the vector should have a higher probability of selection. I implemented in python:

 values = [0.1,0.2,0.3,0.4] count_values = len(values)*[0] answer = len(values)*[0] iterations = 10000 for i in range(0,iterations): rand = float(random.randint(0,iterations))/iterations count = 0 sum = 0 while sum <= rand and count <= len(values): sum += values[count] count += 1 count_values[count-1]+=1 for i in range(0,len(count_values)): answer[i] = float(count_values[i])/iterations 

and executed several times, I calculated the probability of choosing all the elements that should correspond to our initial probability:

 [0.1043, 0.196, 0.307, 0.3927] [0.1018, 0.2003, 0.2954, 0.4025] [0.0965, 0.1997, 0.3039, 0.3999] 
+1
Jan 31 2018-12-12T00:
source share

A cycle is created that runs up to 3 winners. In this loop, generate a specific random number using any random method available in your chosen programming language. After that, start iterating through users. If any probability of the user is less than this random number, accept this user as the winner. If the winner is not selected in any iteration of the cycle, for example, in the case when the least probability in your list is 0.2 and the random number generated is 0.1, then continue the next iteration of the cycle. Get out of the loop when you get 3 winners. The likely pseudo-code for this may be as follows:

 int count=0; while(count<3){ temp=GenerateRandomNumber() int userIndex= AcceptWinner(UserListProbability,temp) //here keep iterating through the users to check which user probability is less than temp and returns the index of the winner in the List if(userIndex==-1)//No winner selected continue; else{ count++; Print List(userIndex) } } 

Note: the list must be sorted.

+2
Jun 03 '14 at 12:12
source share

I assume that in your example, β€œprobability” means β€œweight” (so people with a probability of 1.0 do not guarantee victory and the total probability will not be 1.0)

You can create a tree of nodes in which leaf nodes contained your individual records:

 leaf1 = {:name => "Adam" , :prob => 0.5} leaf2 = {:name => "Ben" , :prob => 1.0} 

and each node contained the sum of the nodes below it

 node1 = { :prob_sum => 1.5 , :children=> [ leaf1, leaf2] } 

The root node then contains the sum of the entire structure

 root_node = { :prob_sum => 33 , :children => [ leaf9, leaf10] } 

Then you pick a random number between zero and the sum contained in the root of the node.

 my_random = rand( root_node.prob_sum ) 

Then cross the tree. Each node contains the sum of all the nodes below it, so if your random number is greater than node, you subtract the value of node and skip this branch.

 def find_node( my_random ): c = children.first() while( c ): if ( c.prob_sum < my_random ): return c.find_node(my_random) my_random -= c.prob_sum c = c.next 

Assuming you have built a balanced tree, you should get the results in O (log n)

Alternatively, you can get the same result by adding the current general field to your current dataset and performing a binary search based on this total. This would probably be easier to implement, but only applicable if your working set can fit in memory.

0
Jan 31 2018-12-12T00
source share

There is also an approach that works today, but has some problems.

Today I create an array and put the probability * 100 records for each person in this array.

Then you can make random direct contents of the array.

The first problem is that it is expensive in all aspects (memory, processing, ...) and does not scale.

The second problem that I experience when choosing the second and third person is when I either take the first one or I do a random cycle until another person is selected.

However, for small datasets (for example, I still have, but have increased over time) this solution works fine.

0
Jan 31 '12 at 8:59
source share



All Articles