Sampling from a restricted distribution of a zipf domain

I would like to try from a zipf distribution from a limited domain.

That is, suppose the domain is {1, ..., N}, I would like each element in the region, i, to be selected with a probability proportional i ** -ato where athe distribution parameter is.

numpy provides a zipf sampler (numpy.random.zipf), but it does not allow me to limit the domain.

How can I easily choose from such a distribution?


If the distribution parameter is agreater than 1, I can use the sampler numpy, rejecting (and re-sampling) all samples in excess N. However, since it does not limit the sampling range, trying to use any lower values adoes not work.

When the domain is finite, there should be no problem using that as, and this is what I need for my application.

+4
source share
2 answers

Using scipy.stats, you can create your own discrete distribution:

bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

For example,

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

N = 7
x = np.arange(1, N+1)
a = 1.1
weights = x ** (-a)
weights /= weights.sum()
bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

sample = bounded_zipf.rvs(size=10000)
plt.hist(sample, bins=np.arange(1, N+2))
plt.show()

gives enter image description here

+5
source

If sampling performance is a problem, you can implement your own sampling method based on reject-inversion sampling. You will find the appropriate Java implementation here .

0
source

Source: https://habr.com/ru/post/1613106/


All Articles