Sampling from a restricted distribution of a zipf domain

Question

Sampling from a restricted distribution of a zipf domain

I would like to try from a zipf distribution from a limited domain.

That is, suppose the domain is {1, ..., N}, I would like each element in the region, i, to be selected with a probability proportional i ** -ato where athe distribution parameter is.

numpy provides a zipf sampler (numpy.random.zipf), but it does not allow me to limit the domain.

How can I easily choose from such a distribution?

If the distribution parameter is agreater than 1, I can use the sampler numpy, rejecting (and re-sampling) all samples in excess N. However, since it does not limit the sampling range, trying to use any lower values adoes not work.

When the domain is finite, there should be no problem using that as, and this is what I need for my application.

+4

python probability sampling distribution

RB Oct 25 '15 at 14:56

source share

2 answers

If sampling performance is a problem, you can implement your own sampling method based on reject-inversion sampling. You will find the appropriate Java implementation here .

0

otmar Oct 27 '15 at 20:20

source share

unutbu · Accepted Answer · 2015-10-25T15:08:58+0000

Using scipy.stats, you can create your own discrete distribution:

bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

For example,

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

N = 7
x = np.arange(1, N+1)
a = 1.1
weights = x ** (-a)
weights /= weights.sum()
bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

sample = bounded_zipf.rvs(size=10000)
plt.hist(sample, bins=np.arange(1, N+2))
plt.show()

gives

Sampling from a restricted distribution of a zipf domain

More articles: