How can I choose from a complex or complex distribution in Haskell?

Question

How can I choose from a complex or complex distribution in Haskell?

I am trying to create random masses for hypothetical planets in Haskell. I want to produce these masses by taking a sample of the bimodal distribution (ideally, a superposition of two normal distributions: one corresponds to small planets and corresponds to gas giants). I looked at the statistics package , which provides a quantile function that can turn a uniformly distributed Double into Double on a number of distributions. But there seems to be no support for compiling distributions.

This particular case can be hacked by choosing one distribution kit or another sample in advance, but I would like to do this with one distribution kit, especially since I may need to configure a common distribution later. In the end, I could replace the normal distribution with real data from sky surveys.

I am considering the option of doing rejection sampling myself, which can handle arbitrary distributions quite simply, but it seems rather inefficient, and it certainly won't be a good idea to implement it if the solution already exists as a library.

Is there a Haskell library that supports fetching from compiled or explicitly defined distributions? Or an existing Haskell implementation of selective sampling? Alternatively, is there an explicit formula for the inverse of the CDF of the sum of two normal distributions?

+6

statistics normal-distribution haskell sampling procedural-generation

interfect May 31 '12 at 3:53

source share

2 answers

Hmmm. The best way I know is to adapt the MonadRandom package to get a "probabilistic monad", borrowing some tools from http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution :

 getRandomStrictlyBetween :: (Ord a, Random a, RandomGen m) => (a, a) -> a getRandomStrictlyBetween (lo, hi) = do x <- getRandomR (lo, hi) -- x is uniformly randomly chosen from the *closed* interval if lo < x && x < hi then return x else getRandomStrictlyBetween (lo, hi) normalValue :: MonadRandom m => m Double normalValue = do u <- getRandomStrictlyBetween (0, 1) v <- getRandomStrictlyBetween (0, 2 * pi) return (sqrt (-2 * log u) * cos v) -- according to Wikipedia

and then you can get more or less arbitrary distributions; for example, to get the distribution of a random variable y with probability p and z with probability (1 - p) , you simply write

 do alpha <- getRandom -- double chosen from [0, 1) if alpha < p then y else z

of which bimodal distributions seem to be a special case. To try out of these distributions, just make an evalRandIO distribution for the sample in the IO monad.

+3

Louis Wasserman May 31 '12 at 8:01

source share

jtobin · Accepted Answer · 2012-05-31T08:27:00+0000

In the case of a simple mix of distributions, you can get an effective sampler using the “hack” that you first mentioned:

This particular case can be hacked by choosing one distribution kit or another sample in advance, but I would like to do it using one distribution kit, especially since I may need to configure the general distribution later.

This is actually a case of a Gibbs sample, which is very common in statistics. It is very flexible, and if you know the number of mixtures that you use, it will probably be difficult to beat. Choose one individual distribution from the entire ensemble to the sample, and then a sample from this conditional distribution. Rinse and repeat.

Here's a simple, non-optimized Haskell implementation for a sampler mixture of a Gaussian Gibbs. This is pretty simple, but you get the idea:

 import System.Random import Control.Monad.State type ModeList = [(Double, Double)] -- A list of mean/stdev pairs, for each mode. -- Generate a Gaussian (0, 1) variate. boxMuller :: StdGen -> (Double, StdGen) boxMuller gen = (sqrt (-2 * log u1) * cos (2 * pi * u2), gen'') where (u1, gen') = randomR (0, 1) gen (u2, gen'') = randomR (0, 1) gen' sampler :: ModeList -> State StdGen Double sampler modeInfo = do gen <- get let n = length modeInfo (z0, g0) = boxMuller gen (c, g1) = randomR (0, n - 1) g0 -- Sample from the components. (cmu, csig) = modeInfo !! c put g1 return $ cmu + csig * z0 -- Sample from the conditional distribution.

Here is an example: sampling 100 times from a one-dimensional mixture of two Gaussians. The modes have values x = -3 and x = 2.5 , and each component of the mixture has its own separate dispersion. You can add as many modes as you want.

 main = do let gen = mkStdGen 42 modeInfo = [(2.5, 1.0), (-3, 1.5)] samples = (`evalState` gen) . replicateM 100 $ sampler modeInfo print samples

Here's a smoothed plot of the density of these 100 samples (using R and ggplot2):

A more versatile algorithm will be sampling of deviation or importance, and in the case of more complex distributions, you probably want to manually perform the corresponding MCMC procedure. Here is a good introduction to Monte Carlo and the MCMC.

How can I choose from a complex or complex distribution in Haskell?

More articles: