I am trying to get the right way to fit the beta distribution. This is not a real world problem, I just check the effects of several different methods, and at the same time something puzzles me.
Here is the python code I'm working on, in which I tested 3 different approaches: 1>: fitting using moments (average sample value and variance). 2>: whenever possible, minimizing negative logarithmic likelihood (using scipy.optimize.fmin ()). 3>: just call scipy.stats.beta.fit ()
from scipy.optimize import fmin
from scipy.stats import beta
from scipy.special import gamma as gammaf
import matplotlib.pyplot as plt
import numpy
def betaNLL(param,*args):
'''Negative log likelihood function for beta
<param>: list for parameters to be fitted.
<args>: 1-element array containing the sample data.
Return <nll>: negative log-likelihood to be minimized.
'''
a,b=param
data=args[0]
pdf=beta.pdf(data,a,b,loc=0,scale=1)
lg=numpy.log(pdf)
lg=numpy.where(lg==-numpy.inf,0,lg)
nll=-1*numpy.sum(lg)
return nll
data=beta.rvs(5,2,loc=0,scale=1,size=500)
mean=numpy.mean(data)
var=numpy.var(data,ddof=1)
alpha1=mean**2*(1-mean)/var-mean
beta1=alpha1*(1-mean)/mean
result=fmin(betaNLL,[1,1],args=(data,))
alpha2,beta2=result
alpha3,beta3,xx,yy=beta.fit(data)
print '\n# alpha,beta from moments:',alpha1,beta1
print '# alpha,beta from mle:',alpha2,beta2
print '# alpha,beta from beta.fit:',alpha3,beta3
plt.hist(data,bins=30,normed=True)
fitted=lambda x,a,b:gammaf(a+b)/gammaf(a)/gammaf(b)*x**(a-1)*(1-x)**(b-1)
xx=numpy.linspace(0,max(data),len(data))
plt.plot(xx,fitted(xx,alpha1,beta1),'g')
plt.plot(xx,fitted(xx,alpha2,beta2),'b')
plt.plot(xx,fitted(xx,alpha3,beta3),'r')
plt.show()
The problem I have is the normalization process ( z=(x-a)/(b-a)), where aand bare the min and max of the sample, respectively.
, , , .
, , .

( ) .
scipy.stats.beta.fit() ( ) , , .
MLE ( ) .
, , . , x=0 x=1 -. , , [0,1]? , ?