Numpy polyfit with data that have different levels of statistical significance

Polyfit is a great tool for fitting lines to multiple points. However, my data have different levels of statistical significance.

For example, for one point (x1, y2) I could have only 10 observations, while for another point (x2, y2) I could have 10,000 observations. I usually have at least 10 points, and I would like each of them to evaluate statistical significance when using polyphyte. Is there a way (or a similar function) that allows this?

+4
source share
3 answers

One possibility is to use weighted least squares in statsmodels

about:

y is a response or endogenous variable ( endog )

x is your 1 dimensional explanatory variable

w your weight array, the higher the weight of this observation

to get a polynomial matrix and put

 import numpy as np import statsmodels.api as sm exog = np.vander(x, degree+1) result = sm.WLS(y, exog, weight=w).fit() 

parameters are in result.params . The set values ​​are in result.fittedvalues

Forecasting has changed between versions. With version 0.4 you can use

 result.predict(np.vander(x_new, degree+1)) 
+3
source

more simple:

 import numpy as np result = np.polynomial.polynomial.polyfit(x,y,deg,w=weight of each observation) 
+2
source

I don't know about numpy, but you can write your own polyfit function. Polyphyte is simply a solution to a linear equation.

http://en.wikipedia.org/wiki/Polynomial_regression#Matrix_form_and_calculation_of_estimates
(in your case epsilon is probably 0)

You can see that all you have to do is multiply each row by y and each row by x with your coefficient.
This shoul will be like 10 lines of code (I remember that it took me 4h to invent the minsquare equation myself, but only 2 lines of code in MATLAB)

+1
source

Source: https://habr.com/ru/post/1385601/


All Articles