How to return an element index with the probability of an element value divided by the sum of an array

Given the array and the value of k, write a function that returns the index of the element, which is equal to k with probability k / sum (input array). Suppose there is no duplicate number in the input array.

For example, if the input array is 1,4,2,3. The function should have the following behavior:

returns 0 with a probability of 1/10;

return 1 with a probability of 4/10;

return 2 with a probability of 2/10;

return 3 with a probability of 3/10;

Question 2: How to deal with this if there are duplicates in the array?

I thought binary search was good to find an element in an array, however I did not understand how to relate it to probability.

Edited : As suggested, this question is similar to my question. However, his decision was not what I expected. I was looking for a solution that has binary search built in , which potentially reduces the time complexity.

A good decision on a given key is how to use binary search to find the first element that is larger than the key in a sorted array.

+4
source share
4 answers

You can make an accumulated array from input where B[i] = A[0] + A[1] + ... + A[i]. Create a random int xbetween 1and sum(A), then a binary search B for the first element at least x.

Python ( Python bisect, ).

import random, bisect, collections

def make_random(A):
    s = sum(A)
    B = list(A)
    for i in xrange(1, len(B)):
        B[i] += B[i-1]
    def fn():
        r = random.randint(1, s)
        return bisect.bisect_left(B, r)
    return fn

rnd = make_random([1,4,2,3])

c = collections.Counter()
for i in xrange(10000):
    c[rnd()]+=1

print c

:

Counter({1: 3960, 3: 3036, 2: 1992, 0: 1012})
+1

( S), r 1 S. a i. i r, a i. i r. , . , .

EDIT ( ): , , , sum, , k x = 0 a i k, x - . , x = 0 a i .

+1

k, , k k/sum ( )

[1, sum]. , cum_distr r [1,sum] i r<=cum_distr[i]

import random


def get_cum_distr(distr):
    cum_distr = []
    sum = 0
    for i in range(len(distr)):
        sum += distr[i]
        cum_distr.append(sum)
    return cum_distr


def sampler(cum_distr):
    r = random.randint(1, cum_distr[-1])
    i = 0
    while r > cum_distr[i]:
        i += 1
    return i


distr = [1, 4, 2, 3]
cum_distr = get_cum_distr(distr)
#test sampler
sample_size = 100000
samples = []
count = dict()
for _ in range(sample_size):
    r = sampler(cum_distr)
    if r in count:
        count[r] += 1
    else:
        count[r] = 1
#{0: 9996, 1: 40115, 2: 19934, 3: 29955}

: , , cum_distr .

, ?

.

0

It looks like a naive probe (and in fact it is), but in the order in which the elements are examined, there is mandate. Assuming the largest weights in front, the cycle often ends in just a couple of iterations. Thus, if the distribution is very distorted, this method can be faster on average.

[I used this trick to select from stochastic vectors used in Markov nodes in Wakkerbot]

#include <stdio.h>
#include <stdlib.h>

struct samp {
    int ret;
    unsigned weight;
    } array[4] = {{ 1,4}, { 3,3}, {2,2}, { 0,1} };

unsigned sumweight = 10;

     /* this is a *terrible* way to obtain a uniform random value */
#define urand(n) (random() % (n))

int sample(void)
{
unsigned idx, val;

val = urand(sumweight);

for( idx=0; idx < 4; idx++ ) {
    if (val < array[idx].weight) return array[idx].ret;
    val -= array[idx].weight;
    }
return -1;
}

int main(void)
{
int ret;
unsigned loop;

for (loop = 0; loop < 20; loop++) {
    ret = sample();
    printf("%u: %d\n" , loop, ret);
    }
return 0;
}
0
source

Source: https://habr.com/ru/post/1609621/


All Articles