Polynomial fitting using GNU Scientific C

Question

Polynomial fitting using GNU Scientific C

I need to get an n-degree function from n + 1 data points. Using the following gnuplot script, I get the correct installation:

f(x) = a + b*x + c*x**2 + d*x**3 + e*x**4 + f*x**5 + g*x**6 + h*x**7 + i*x**8 + j*x**9 + k*x**10 + l*x**11 + m*x**12 # Initial values for parameters a = 0.1 b = 0.1 c = 0.1 d = 0.1 e = 0.1 f = 0.1 g = 0.1 h = 0.1 i = 0.1 j = 0.1 k = 0.1 l = 0.1 m = 0.1 # Fit f to the following data by modifying the variables a, b, c fit f(x) '-' via a, b, c, d, e, f, g, h, i, j, k, l, m 4.877263 45.036000 4.794907 44.421000 4.703827 43.808000 4.618065 43.251000 4.530520 42.634000 4.443111 42.070000 4.357077 41.485000 4.274298 40.913000 4.188404 40.335000 4.109381 39.795000 4.027594 39.201000 3.946413 38.650000 3.874360 38.085000 e

After installation, I received the following coefficients:

 a = -781956 b = -2.52463e+06 c = 2.75682e+06 d = -553791 e = 693880 f = -1.51285e+06 g = 1.21157e+06 h = -522243 i = 138121 j = -23268.8 k = 2450.79 l = -147.834 m = 3.91268

Then, having built the data and f (x) together, it seems that the given coefficients are right: Here one can see the function fitted over the data points.

However, I need to get such an installation using c code. In some cases, the correct GNU Scientific Library code for polynomial matching ( as in this link ). But for the above data (and a few other cases in my dataset), the result I got was spoiled.

For example, the following code (which uses the same data from the above):

 void testOfPolynomialFit(){ double x[13] = {4.877263, 4.794907, 4.703827, 4.618065, 4.530520, 4.443111, 4.357077, 4.274298, 4.188404, 4.109381, 4.027594, 3.946413, 3.874360}; double y[13] = {45.036000, 44.421000, 43.808000, 43.251000, 42.634000, 42.070000, 41.485000, 40.913000, 40.335000, 39.795000, 39.201000, 38.650000, 38.085000}; double coefficients[13]; polynomialfit(13, 13, x, y, coefficients); int i, n = 13; for (i = 0; i < n; i++) { printf("%lf\t", coefficients[i]); } printf("\n"); }

Results in:

 -6817581083.803348 12796304366.105989 -9942834843.404181 3892080279.353104 -630964566.517794 -75914607.005088 49505072.518952 -5062100.000931 -1426228.491628 514259.312320 -70903.844354 4852.824607 -136.738756

Which corresponds to the function in the form:

 c(x)=-6837615134.799868+12834646330.586414*x**1-9973474377.668280*x**2+3904659818.834625*x**3-633282611.288889*x**4-76066283.747942*x**5+49670960.939126*x**6-5091123.449217*x**7-1426628.818192*x**8+515175.778491*x**9-71055.177018*x**10+4863.969973*x**11-137.065848*x**12

You can check what c (x) looks like:

In such an image, a (x) and b (x) are functions set using the "polyomialfit" for only a few points (4 and 7).

So, any tips on what I'm doing wrong here? Is there other c code that provides the correct installation?

+6

c gnu gsl

Isma Jan 7 '15 at 18:16

source share

3 answers

You suffer from numerical instability. A simple linear regression confirms what can be visually observed: 99.98% of the variation can be explained using only a linear model.

The code in the link contains several very unsafe things: not checking that obs or degree positive, not checking that the memory allocation was successful, and not returning useful, ...

I would suggest that gsl_multifit_linear overflows or cannot contain a numerical instability and does not check for a returned facility that we simply don’t know.

Edit:

According to the GSL Website, polynomial regression can lead to additional numerical instability due to the calculation of large powers of large numbers. Try pre-processing x values with x = (x - avg_x) / sd_x . This should allow you to get a couple more degrees of your polynomial before this happens.

In the long run, you are likely to run into this problem again. If you perform this analysis with 35 or 100 or more data points, it is unlikely that you will be able to find any technique to overcome the instability.

+2

Degustaf Jan 7 '15 at 18:52

source share

I am a bit confused between the problem statement and the sample code. A polynomial function function would normally expect> = n + 2 data points to correspond to a polynomial of degree n. In the case of n + 1 data points, instead of fitting, you create an exact solution (if rounding with a floating point was not a problem) by creating a matrix with n + 1 rows and columns with a given set of n + 1 values to represent a set of linear equations :

 | x[0]^n + x[0]^(n-1) + ... + x[0] + 1 | | c[ n] | | y[0] | | x[1]^n + x[1]^(n-1) + ... + x[1] + 1 | | c[n-1] | | y[1] | | ... = | ... | | x[n]^n + x[n]^(n-1) + ... + x[n] + 1 | | c[ 0] | | y[n] |

Thus, only c [] are variables, and the equations are linear. Invert the matrix of x values, then multiply the y values by the inverted matrix to get the result. Problems may arise if the actual polynomial is less than n (two or more equations will not be linearly independent). If this happens, you can use one or more smaller lines or switch to using the usual polynomial matching algorithm.

If there are> = n + 2 data points, one of them is a polynomial least-squares subset. Here is a link to a .rtf document that uses orthogonal and recursively defined polynomials to match multiple data points. Orthogonal polynomials eliminate the need to invert the matrix, so it can more accurately process polynomials to a greater degree.

opls.rtf

The example is executed with degree = 1 and degree = 3. The first column is the original value of x, the second column is the original value of y, the third column is calculated by the value of y, the 4th column (the initial value of y is the calculated value of y):

 variance = 9.6720e-004 6.8488e+000 X**1 + 1.1619e+001 4.877263e+000 4.503600e+001 4.502243e+001 1.356604e-002 4.794907e+000 4.442100e+001 4.445839e+001 -3.739271e-002 4.703827e+000 4.380800e+001 4.383460e+001 -2.660237e-002 4.618065e+000 4.325100e+001 4.324723e+001 3.765950e-003 4.530520e+000 4.263400e+001 4.264765e+001 -1.365429e-002 4.443111e+000 4.207000e+001 4.204901e+001 2.099404e-002 4.357077e+000 4.148500e+001 4.145977e+001 2.522524e-002 4.274298e+000 4.091300e+001 4.089284e+001 2.016354e-002 4.188404e+000 4.033500e+001 4.030456e+001 3.043591e-002 4.109381e+000 3.979500e+001 3.976335e+001 3.165004e-002 4.027594e+000 3.920100e+001 3.920321e+001 -2.205684e-003 3.946413e+000 3.865000e+001 3.864721e+001 2.788204e-003 3.874360e+000 3.808500e+001 3.815373e+001 -6.873392e-002 variance = 2.4281e-004 8.0952e-001 X**3 + -1.0822e+001 X**2 + 5.4910e+001 X**1 + -5.9287e+001 4.877263e+000 4.503600e+001 4.502276e+001 1.324045e-002 4.794907e+000 4.442100e+001 4.444280e+001 -2.180419e-002 4.703827e+000 4.380800e+001 4.381431e+001 -6.306292e-003 4.618065e+000 4.325100e+001 4.323170e+001 1.929905e-002 4.530520e+000 4.263400e+001 4.264294e+001 -8.935141e-003 4.443111e+000 4.207000e+001 4.205786e+001 1.214442e-002 4.357077e+000 4.148500e+001 4.148153e+001 3.468503e-003 4.274298e+000 4.091300e+001 4.092369e+001 -1.069376e-002 4.188404e+000 4.033500e+001 4.033844e+001 -3.436876e-003 4.109381e+000 3.979500e+001 3.979160e+001 3.397859e-003 4.027594e+000 3.920100e+001 3.921454e+001 -1.354191e-002 3.946413e+000 3.865000e+001 3.862800e+001 2.199866e-002 3.874360e+000 3.808500e+001 3.809383e+001 -8.830768e-003

0

rcgldr Jan 7 '15 at 21:35

source share

Wumpus Q. Wumbley · Accepted Answer · 2015-01-07T19:33:16+0000

One of the main differences between your two solution methods is that when using gnuplot you make fit to set the coefficients and build the function there in the same program, while with GSL you copy numbers from one program to another.

If you use the output of printf("%lf", ...) to enter the second gnuplot program, you have lost a lot of precision because printf rounds the numbers more than any internal operations of any of the programs. And since this is a numerical unsustainable problem, a little rounding hurts a lot.

When x is 4.877263, x**12 is approximately 181181603.850932, so if your m turned off at 0.000001 (the default rounding level is printf) that introduces error 181.181603850932, which is a relative error of about 300%, the actual y value for that x .

Try %.60lf and see if it improves.

If one of the programs uses a long double inside and the other does not, then you probably won’t get a good match no matter what you do.

Polynomial fitting using GNU Scientific C

More articles: