Estimation of probability distribution and sampling from it in Julia

I try to use Julia to estimate a continuous one-dimensional distribution using N observed data points (stored as an array of Float64 numbers), and then fetch from this estimated distribution. I do not have any prior knowledge limiting attention to a certain family of distributions.

I was thinking about using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the result.

Any help / advice would be greatly appreciated.

+5
source share
1 answer

Without any restriction on the intended distribution, the empirical distribution function would be a natural candidate (see Wikipedia ). For this distribution, there are very good theorems on convergence to a real distribution (see Dvoretsky-Kiefer-Wolfowitz inequality ).

With this choice, the selection is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a collection of new samples from an empirical distribution. With Distributions, this can be more readable, for example:

 using Distributions new_sample = sample(dataset,sample_size) 

Finally, an estimate of the density of the nucleus is also good, but you may need to choose a parameter (the kernel and its width). This shows preference for some family of distributions. Sampling from a core distribution is surprisingly similar to a sample from an empirical distribution: 1. Select a sample from the empirical distribution; 2. Disturb each sample using a core function sample.

For example, if the kernel function is a normal distribution of width w , then the perturbed sample can be calculated as:

 new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size) 
+6
source

Source: https://habr.com/ru/post/1258486/


All Articles