Estimation of probability distribution and sampling from it in Julia

Question

Estimation of probability distribution and sampling from it in Julia

I try to use Julia to estimate a continuous one-dimensional distribution using N observed data points (stored as an array of Float64 numbers), and then fetch from this estimated distribution. I do not have any prior knowledge limiting attention to a certain family of distributions.

I was thinking about using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the result.

Any help / advice would be greatly appreciated.

+5

distribution julia-lang kernel-density

Chai Oct 19 '16 at 14:54

source share

1 answer

Dan getz · Accepted Answer · 2016-10-19T18:45:18+0000

Without any restriction on the intended distribution, the empirical distribution function would be a natural candidate (see Wikipedia ). For this distribution, there are very good theorems on convergence to a real distribution (see Dvoretsky-Kiefer-Wolfowitz inequality ).

With this choice, the selection is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a collection of new samples from an empirical distribution. With Distributions, this can be more readable, for example:

 using Distributions new_sample = sample(dataset,sample_size)

Finally, an estimate of the density of the nucleus is also good, but you may need to choose a parameter (the kernel and its width). This shows preference for some family of distributions. Sampling from a core distribution is surprisingly similar to a sample from an empirical distribution: 1. Select a sample from the empirical distribution; 2. Disturb each sample using a core function sample.

For example, if the kernel function is a normal distribution of width w , then the perturbed sample can be calculated as:

 new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size)

Estimation of probability distribution and sampling from it in Julia

More articles: