Creating MLE for a pair of distributions in python

Question

Creating MLE for a pair of distributions in python

Ok, so my current curve fitting code has a step that uses scipy.stats to determine the correct distribution based on the data,

distributions = [st.laplace, st.norm, st.expon, st.dweibull, st.invweibull, st.lognorm, st.uniform] mles = [] for distribution in distributions: pars = distribution.fit(data) mle = distribution.nnlf(pars, data) mles.append(mle) results = [(distribution.name, mle) for distribution, mle in zip(distributions, mles)] for dist in sorted(zip(distributions, mles), key=lambda d: d[1]): print dist best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0] print 'Best fit reached using {}, MLE value: {}'.format(best_fit[0].name, best_fit[1]) print [mod[0].name for mod in sorted(zip(distributions, mles), key=lambda d: d[1])]

Where data is a list of numerical values. So far, this works great for setting unimodal distributions, confirmed in a script that randomly generates values from random distributions and uses curve_fit to override the parameters.

Now I would like to make the code able to handle bimodal distributions, for example, the example below:

Is it possible to get MLE for a pair of models from scipy.stats to determine if some pair of distributions is suitable for the data ?, something like

 distributions = [st.laplace, st.norm, st.expon, st.dweibull, st.invweibull, st.lognorm, st.uniform] distributionPairs = [[modelA.name, modelB.name] for modelA in distributions for modelB in distributions]

and use these pairs to get the MLE value of this pair of data fitting distributions?

+5

python scipy statistics distribution model-fitting

BruceJohnJennerLawso Feb 03 '17 at 22:18

source share

1 answer

probaPerception · Accepted Answer · 2017-02-09T21:20:42+0000

This is not a complete answer, but it can help you solve your problem. Let's say you know that your problem is caused by two densities. The solution would be to use the k-mean or EM algorithm.

initialization. You initialize your algorithm by acting on each observation for a particular density. And you initialize two densities (you initialize the density parameters, and one of the parameters in your case is "Gaussian", "Laplace", etc. .... Iteration. Then, repeating, you will perform the following two steps:

Step 1. Optimize the parameters, assuming that the affectation of each point is correct. Now you can use any optimizer. This step gives you an estimate of the best two densities (with a given parameter) that match your data.

Step 2. You classify each observation into one density or another according to the highest probability.

Repeat until closer.

This is very well explained on this webpage https://people.duke.edu/~ccc14/sta-663/EMAlgorithm.html

If you do not know how many fruits your data created, the problem is more complicated. You have to deal with the fined classification problem, which is a little more complicated.

Here is an example of coding in the simple case: you know that your data comes from two different Gaussians (you don’t know how many variables are generated from each density). In your case, you can tune this code to a loop on every possible density pair (computationally longer, but empirically work, I suppose)

 import scipy.stats as st import numpy as np #hard coded data generation data = np.random.normal(-3, 1, size = 1000) data[600:] = np.random.normal(loc = 3, scale = 2, size=400) #initialization mu1 = -1 sigma1 = 1 mu2 = 1 sigma2 = 1 #criterion to stop iteration epsilon = 0.1 stop = False while not stop : #step1 classification = np.zeros(len(data)) classification[st.norm.pdf(data, mu1, sigma1) > st.norm.pdf(data, mu2, sigma2)] = 1 mu1_old, mu2_old, sigma1_old, sigma2_old = mu1, mu2, sigma1, sigma2 #step2 pars1 = st.norm.fit(data[classification == 1]) mu1, sigma1 = pars1 pars2 = st.norm.fit(data[classification == 0]) mu2, sigma2 = pars2 #stopping criterion stop = ((mu1_old - mu1)**2 + (mu2_old - mu2)**2 +(sigma1_old - sigma1)**2 +(sigma2_old - sigma2)**2) < epsilon #result print("The first density is gaussian :", mu1, sigma1) print("The first density is gaussian :", mu2, sigma2) print("A rate of ", np.mean(classification), "is classified in the first density")

Hope this helps.

Creating MLE for a pair of distributions in python

More articles: