What specific requirements does the function passed to scipy.optimize.curve_fit fulfill to execute?

Question

What specific requirements does the function passed to scipy.optimize.curve_fit fulfill to execute?

I am working on installing statistical models in distributions using the matplotlib hist function. For example, my code matches the exponential distribution using the following code:

  try: def expDist(x, a, x0): return a*(exp(-(x/x0))/x0) self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55) popt,pcov = curve_fit(expDist,self.bins[:-1], self.n, p0=[1,mean]) print "Fitted gaussian curve to data with params a %f, x0 %f" % (popt[0], popt[1]) self.a = popt[0] self.x0 = popt[1] self.fitted = True except RuntimeError: print "Unable to fit data to exponential curve"

Which works fine, but when I modify it to do the same for even distribution between a and b ,

  def uniDist(x, a, b): if((x >= a)and(x <= b)): return float(1.0/float(ba)) else: return 0.000 try: self.n, self.bins, patches = plt.hist(self.getDataSet(), self.getDatasetSize()/10, normed=1, facecolor='blue', alpha = 0.55) popt,pcov = curve_fit(uniDist,self.bins[:-1], self.n, p0=[a, b]) print "Fitted uniform distribution curve to data with params a %f, b %f" % (popt[0], popt[1]) self.a = popt[0] self.b = popt[1] self.fitted = True except RuntimeError: print "Unable to fit data to uniform distribution pdf curve"

Code Failure:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any () or a.all ()

The problem is that somewhere in curve_fit function is trying to call the function that will be installed ( expDist and uniDist in this case) with an iterable set of values, but I can’t figure out how the expDist function can take something iterative without crashing?

+5

python numpy scipy mathematical-optimization curve-fitting

BruceJohnJennerLawso Jan 28 '17 at 0:04

source share

1 answer

Andras deak · Accepted Answer · 2017-01-28T01:11:58+0000

Your suspicion is partially true. curve_fit does pass iterability to a function, but not just any iterative: a numpy.ndarray . They have vectorized arithmetic operators, therefore

 a*(exp(-(x/x0))/x0)

will just work with elements on input arrays without any errors (and with the correct output). There is not much magic: for each function evaluation, the parameters a and x0 will be scalars, only x is an array.

Now the problem with uniDist is that it contains not only arithmetic operators: it also contains comparison operators. They work fine as long as only one array compares with a scalar:

 >>> import numpy as np >>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a>2 array([False, False, False, True, True], dtype=bool)

The above shows that using comparison operators for array and scalar will again lead to elementary results. An error occurs when you try to apply a logical operator to two of these logical arrays:

 >>> a>2 array([False, False, False, True, True], dtype=bool) >>> a<4 array([ True, True, True, True, False], dtype=bool) >>> (a>2) and (a<4) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The error message is a bit confusing. This can be illustrated by the fact that python will try to create one result for array1 and array2 (which in python will return any array based on their emptiness). However, numpy suspects that this is not what you want to do, and resists the temptation to guess.

Since you want your function to work on elements across two logical arrays (which came from the comparison operation), you will need to use the & operator. This is binary and native python, but for numpy arrays this gives an elementary boolean and array. You can also use numpy.logical_and (or in your case scipy.logical_and ) for a more explicit one:

 >>> (a>2) & (a<4) array([False, False, False, True, False], dtype=bool) >>> np.logical_and(a>2,a<4) array([False, False, False, True, False], dtype=bool)

Please note that for the case & you always need to copy your comparisons, since again a>2&a<4 will be ambiguous (to the programmer) and incorrect (considering what you want to do). Since "binary" and "boolean" will behave exactly as you expected, it is safe to rewrite your function to use & instead of and to compare two comparisons.

However, there is one more step that you will need to change: in the case of inputs, ndarray if will also behave differently. Python cannot help make a single choice in if , which is also true if you put an array in it. But what you really want to do is restrict the elements of your output element (again). Thus, you either have to iterate over the array (do not) or make this choice again in vector form. The latter is idiomatic using numpy / scipy:

 import scipy as sp def uniDist(x, a, b): return sp.where((a<=x) & (x<=b), 1.0/(ba), 0.0)

This (namely numpy.where ) will return an array of the same size as x . For elements where the condition is True , the output value will be 1/(ba) . The rest is 0 . For scalar x return value is a numeric scalar. Note that I removed the float transform in the example above, since having 1.0 in the numerator will definitely give you true division, even though you are using python 2. Although I would suggest using python 3 or at least from __future__ import division .

A minor note: even for the scalar case, I would suggest using a python chain of operators for comparison, which is suitable for this purpose. I mean, you can just do if a <= x <= b: ... , and unlike most languages, this will be functionally equivalent to what you wrote (but prettier).

What specific requirements does the function passed to scipy.optimize.curve_fit fulfill to execute?

More articles: