Numpy polyfit with data that have different levels of statistical significance

Question

Numpy polyfit with data that have different levels of statistical significance

Polyfit is a great tool for fitting lines to multiple points. However, my data have different levels of statistical significance.

For example, for one point (x1, y2) I could have only 10 observations, while for another point (x2, y2) I could have 10,000 observations. I usually have at least 10 points, and I would like each of them to evaluate statistical significance when using polyphyte. Is there a way (or a similar function) that allows this?

+4

python numpy scipy statistics

Sam odio Dec 9 '11 at 21:04

source share

3 answers

user333700 · Answer 1 · 2011-12-10T05:47:36+0000

One possibility is to use weighted least squares in statsmodels

about:

y is a response or endogenous variable ( endog )

x is your 1 dimensional explanatory variable

w your weight array, the higher the weight of this observation

to get a polynomial matrix and put

 import numpy as np import statsmodels.api as sm exog = np.vander(x, degree+1) result = sm.WLS(y, exog, weight=w).fit()

parameters are in result.params . The set values are in result.fittedvalues

Forecasting has changed between versions. With version 0.4 you can use

 result.predict(np.vander(x_new, degree+1))

Andre Lehmann · Answer 2 · 2012-12-17T08:17:11+0000

more simple:

 import numpy as np result = np.polynomial.polynomial.polyfit(x,y,deg,w=weight of each observation)

Luka Rahne · Answer 3 · 2011-12-09T22:47:42+0000

I don't know about numpy, but you can write your own polyfit function. Polyphyte is simply a solution to a linear equation.

http://en.wikipedia.org/wiki/Polynomial_regression#Matrix_form_and_calculation_of_estimates
(in your case epsilon is probably 0)

You can see that all you have to do is multiply each row by y and each row by x with your coefficient.
This shoul will be like 10 lines of code (I remember that it took me 4h to invent the minsquare equation myself, but only 2 lines of code in MATLAB)

Numpy polyfit with data that have different levels of statistical significance

More articles: