Determining the lognormal distribution using Scipy vs Matlab

Question

Determining the lognormal distribution using Scipy vs Matlab

I am trying to establish a lognormal distribution using Scipy. I have already done this using Matlab before, but due to the need to extend the application beyond statistical analysis, I am trying to reproduce the set values in Scipy.

The following is the Matlab code that I used for my data:

% Read input data (one value per line) x = []; fid = fopen(file_path, 'r'); % reading is default action for fopen disp('Reading network degree data...'); if fid == -1 disp('[ERROR] Unable to open data file.') else while ~feof(fid) [x] = [x fscanf(fid, '%f', [1])]; end c = fclose(fid); if c == 0 disp('File closed successfully.'); else disp('[ERROR] There was a problem with closing the file.'); end end [f,xx] = ecdf(x); y = 1-f; parmhat = lognfit(x); % MLE estimate mu = parmhat(1); sigma = parmhat(2);

And here is the established plot:

enter image description here

Now here is my Python code with the goal of achieving the same:

 import math from scipy import stats from statsmodels.distributions.empirical_distribution import ECDF # The same input is read as a list in Python ecdf_func = ECDF(degrees) x = ecdf_func.x ccdf = 1-ecdf_func.y # Fit data shape, loc, scale = stats.lognorm.fit(degrees, floc=0) # Parameters sigma = shape # standard deviation mu = math.log(scale) # meanlog of the distribution fit_ccdf = stats.lognorm.sf(x, [sigma], floc=1, scale=scale)

Here's a fit using Python code.

As you can see, both sets of code are capable of creating good tricks, at least visually.

The problem is that there is a huge difference in the estimated parameters mu and sigma.

From Matlab: mu = 1.62 sigma = 1.29. From Python: mu = 2.78 sigma = 1.74.

Why is there such a difference?

Note. I double checked that both datasets are installed the same way. The same number of points, the same distribution.

Your help is much appreciated! Thanks in advance.

Additional Information:

 import scipy import numpy import statsmodels scipy.__version__ '0.9.0' numpy.__version__ '1.6.1' statsmodels.__version__ '0.5.0.dev-1bbd4ca'

Matlab version is R2011b.

Edition:

As shown in the answer below, the error lies with Scipy 0.9. I can reproduce the results of mu and sigma from Matlab using Scipy 11.0.

Easy way to update your Scipy:

 pip install --upgrade Scipy

If you don't have pip (you should!):

 sudo apt-get install pip

+3

python scipy statistics matlab curve-fitting

Mike Mar 26 '13 at 6:23

source share

1 answer

Warren weckesser · Accepted Answer · 2013-03-26T08:59:35+0000

The fit method has a bug in scipy 0.9.0, which is fixed in later versions of scipy.

The output of the script below should be:

 Explicit formula: mu = 4.99203450, sig = 0.81691086 Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086 Fit x to lognorm: mu = 4.99203468, sig = 0.81691081

but with scipy 0.9.0, this

 Explicit formula: mu = 4.99203450, sig = 0.81691086 Fit log(x) to norm: mu = 4.99203450, sig = 0.81691086 Fit x to lognorm: mu = 4.23197270, sig = 1.11581240

The following script test shows three ways to get the same results:

 import numpy as np from scipy import stats def lognfit(x, ddof=0): x = np.asarray(x) logx = np.log(x) mu = logx.mean() sig = logx.std(ddof=ddof) return mu, sig # A simple data set for easy reproducibility x = np.array([50., 50, 100, 200, 200, 300, 500]) # Explicit formula my_mu, my_sig = lognfit(x) # Fit a normal distribution to log(x) norm_mu, norm_sig = stats.norm.fit(np.log(x)) # Fit the lognormal distribution lognorm_sig, _, lognorm_expmu = stats.lognorm.fit(x, floc=0) print "Explicit formula: mu = %10.8f, sig = %10.8f" % (my_mu, my_sig) print "Fit log(x) to norm: mu = %10.8f, sig = %10.8f" % (norm_mu, norm_sig) print "Fit x to lognorm: mu = %10.8f, sig = %10.8f" % (np.log(lognorm_expmu), lognorm_sig)

With the option ddof=1 in std. deviation calculation to use an unbiased estimate estimate:

 In [104]: x Out[104]: array([ 50., 50., 100., 200., 200., 300., 500.]) In [105]: lognfit(x, ddof=1) Out[105]: (4.9920345004312647, 0.88236457185021866)

The matlab lognfit documentation has a note saying that when censorship is not used, lognfit calculates sigma using the square root of an unbiased variance estimate. This corresponds to using ddof = 1 in the above code.

Determining the lognormal distribution using Scipy vs Matlab

More articles: