I used the following approach, although this is an OTT bit.
Consider adding d [i] to y [i] to get s [i], a smoothed value. We strive to minimize
S = Sum{ 1<=i<N-1 | sqr( s[i+1]-2*s[i]+s[i-1] } + f*Sum{ 0<=i<N | sqr( d[i])}
The first term is the sum of the squares of the (approximate) second derivative of the curve, and the second term fines the transition from the original. f is a (positive) constant. Little algebra repeats this as
S = sqr( ||A*d - b||)
where the matrix A has a good structure and, indeed, A '* A is penta-diagonal, which means that the normal equations (i.e. d = Inv (A' * A) * A '* b) can be effectively solved, Note that d is calculated directly; there is no need to initialize it.
Given the solution d of this problem, we can calculate the solution d ^ for the same problem, but with the restriction One '* d = 0 (where One is the vector of all units) like this
d^ = d - (One'*d/Q) * e e = Inv(A'*A)*One Q = One'*e
What value should be used for f? Well, a simple approach is to try this procedure on sample curves for different fs and choose a value that looks good. Another approach is to select a smoothness estimate, for example, the rms of the second derivative, and then the value that should reach, and then search f, which yields that value. As a rule, the larger f, the less smooth the smooth curve will be.
Some motivation for all this. The goal is to find a “smooth” curve “close” to a given one. To do this, we need a measure of smoothness (first term in S) and a measure of proximity (second term. Why are these measures? Well, each of them is easy to calculate and each of them is quadratic in the variables (d []), this will mean that the problem becomes instance of linear least squares for which there are efficient algorithms.Moreover, each term in each sum depends on the close values of the variables, which in turn means that the "inverse covariance" (A '* A) will have a strip structure, and therefore the problem least squares can be effectively resolved A. Why introduce f? Well, if we did not have f (or did not have 0), we could minimize S by setting d [i] = -y [i], getting a perfectly smooth curve s [] = 0, which has nothing to do with the y curve. On the other hand, if f is gigantic, then to minimize s we need to focus on the second term and set d [i] = 0, and our “smoothed” curve is only the original. that with a change in f, the corresponding solutions will change b between very smooth, but far from y (small f) and close to y, but slightly rough (larger mi f).
It has often been said that the normal equations, the use of which I advocate here, are a poor way to solve least-squares problems, and this is usually true. However, when using "good" banded systems, as here, the loss of stability using normal equations is not so great, and the gain in speed is so great. I used this approach for smooth curves with many thousands of points in a reasonable amount of time.
To find out what A is, consider the case when we had 4 points. Then our expression for S reduces to the following:
sqr( s[2] - 2*s[1] + s[0]) + sqr( s[3] - 2*s[2] + s[1]) + f*(d[0]*d[0] + .. + d[3]*d[3]).
If we substitute s [i] = y [i] + d [i], then we obtain, for example,
s[2] - 2*s[1] + s[0] = d[2]-2*d[1]+d[0] + y[2]-2*y[1]+y[0]
and therefore we see that for this to be sqr (|| A * db ||), we must take
A = ( 1 -2 1 0) ( 0 1 -2 1) ( f 0 0 0) ( 0 f 0 0) ( 0 0 f 0) ( 0 0 0 f) and b = ( -(y[2]-2*y[1]+y[0])) ( -(y[3]-2*y[2]+y[1])) ( 0 ) ( 0 ) ( 0 ) ( 0 )
In the implementation, however, you probably would not want to form A and b, since they will only be used to form the normal terms of the equation A '* A and A' * b. It would be easier to accumulate them directly.