Lognormal distribution in python

Question

Lognormal distribution in python

I saw several questions in stackoverflow regarding how to install log-normal distribution . There are still two explanations that I need.

I have an example of data whose logarithm follows a normal distribution. So I can put data using scipy.stats.lognorm.fit (i.e. log-normal distribution )

The fit works fine and also gives a standard deviation. Here is my code snippet with the results.

 sample = np.log10(data) #taking the log10 of the data scatter,loc,mean = stats.lognorm.fit(sample) #Gives the paramters of the fit x_fit = np.linspace(13.0,15.0,100) pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,mean) #Gives the PDF print "scatter for data is %s" %scatter print "mean of data is %s" %mean

enter image description here RESULT

 scatter for data is 0.186415047243 mean for data is 1.15731050926

From the image you can clearly see that the mean is around 14.2, but what I get is 1.15??!! Why is this so? clearly the log(mean) is also not near 14.2!!

THIS POST and THIS QUESTION mention that log(mean) is the actual average.

But you can see from my code above, which I got using sample = log(data) , and it seems to work well too. However, when I tried

 sample = data pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,np.log10(mean))

Fits, doesn't seem to work.

1) Why is the average value not equal to 14.2?

2) How to draw vertical fill / draw lines showing the sigma confidence area?

+5

python scipy statistics

Thepredator Oct 16 '14 at 13:45

source share

1 answer

Warren weckesser · Accepted Answer · 2014-10-18T17:51:14+0000

You speak

I have a sample data whose logarithm follows a normal distribution.

Suppose data is an array containing samples. To match this data with a scipy.stats.lognorm distribution using scipy.stats.lognorm , use:

 s, loc, scale = stats.lognorm.fit(data, floc=0)

Now suppose mu and sigma are the mean and standard deviation of the underlying normal distribution. To get an estimate of these values from this fit, use:

 estimated_mu = np.log(scale) estimated_sigma = s

(These are not estimates of the mean and standard deviation of the samples in data . See the wikipedia page for formulas for the mean and variance of the log-normal distribution in terms of mu and sigma.)

To combine the histogram and the PDF, you can use, for example,

 import matplotlib.pyplot as plt. plt.hist(data, bins=50, normed=True, color='c', alpha=0.75) xmin = data.min() xmax = data.max() x = np.linspace(xmin, xmax, 100) pdf = stats.lognorm.pdf(x, s, scale=scale) plt.plot(x, pdf, 'k')

If you want to view the data log, you can do something like the following. Please note that the normal distribution PDF is used here.

 logdata = np.log(data) plt.hist(logdata, bins=40, normed=True, color='c', alpha=0.75) xmin = logdata.min() xmax = logdata.max() x = np.linspace(xmin, xmax, 100) pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma) plt.plot(x, pdf, 'k')

By the way, an alternative to fitting with stats.lognorm is to set log(data) using stats.norm.fit :

 logdata = np.log(data) estimated_mu, estimated_sigma = stats.norm.fit(logdata)

Lognormal distribution in python

More articles: