Partial Gaussian Installation

I am trying to fit the sum of Gaussians using scikit-learn , because scikit-learn GaussianMixture seems much more reliable than using curve_fit.

Problem : it is not very suitable for setting the truncated part of even one Gaussian peak:

from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy as np clf = mixture.GaussianMixture(n_components=1, covariance_type='full') data = np.random.randn(10000) data = [[x] for x in data] clf.fit(data) data = [item for sublist in data for item in sublist] rangeMin = int(np.floor(np.min(data))) rangeMax = int(np.ceil(np.max(data))) h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True); plt.plot(np.linspace(rangeMin, rangeMax), mlab.normpdf(np.linspace(rangeMin, rangeMax), clf.means_, np.sqrt(clf.covariances_[0]))[0]) 

gives enter image description here now changing data = [[x] for x in data] to data = [[x] for x in data if x <0] to truncate the distribution return enter image description here Any ideas how to truncate truncation properly?

Note The distribution is not necessarily truncated in the middle; there may be something between 50% and 100% of the remaining full distribution.

I would also be happy if someone could point me to alternative packages. I only tried curve_fit, but could not get it to do anything useful once more than two peaks were involved.

+6
source share
2 answers

The scipy stats module has a special statistical distribution for truncated Gaussian data called named truncnorm, it corresponds to your example truncated data, as expected when I tried it. If you can possibly use it, then the meager documentation for truncnorm at:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

+3
source

A bit crude but simple solution would be to split the curve into two halves ( data = [[x] for x in data if x < 0] ), flip the left side ( data.append([-data[d][0]]) ), and then perform a regular Gaussian fit.

 import numpy as np from sklearn import mixture import matplotlib.pyplot as plt import matplotlib.mlab as mlab np.random.seed(seed=42) n = 10000 clf = mixture.GaussianMixture(n_components=1, covariance_type='full') #split the data and mirror it data = np.random.randn(n) data = [[x] for x in data if x < 0] n = len(data) for d in range(n): data.append([-data[d][0]]) clf.fit(data) data = [item for sublist in data for item in sublist] rangeMin = int(np.floor(np.min(data))) rangeMax = int(np.ceil(np.max(data))) h = plt.hist(data[0:n], bins=20, range=(rangeMin, rangeMax), normed=True); plt.plot(np.linspace(rangeMin, rangeMax), mlab.normpdf(np.linspace(rangeMin, rangeMax), clf.means_, np.sqrt(clf.covariances_[0]))[0] * 2) plt.show() 

enter image description here

+2
source

Source: https://habr.com/ru/post/1014539/


All Articles