Algorithm for calculating the distance of the sum of squares of the rolling window from a given line function

For the row function y = a*x + b ( a and b are previously known constants), it is easy to calculate the distance between the squares between the line and the window of the samples (1, Y1), (2, Y2), ..., (n, Yn) (where Y1 is the oldest sample and Yn is the newest):

 sum((Yx - (a*x + b))^2 for x in 1,...,n) 

I need a quick algorithm to calculate this value for a rolling window (length n ). I cannot re-view all the samples in the window every time a new sample arrives.
Obviously, some state needs to be saved and updated for each new sample that enters the window, and each old sample leaves the window.
Note that when the sample leaves the window, the indices of the remaining samples also change - each Yx becomes Y (x-1). Therefore, when a sample leaves the window, each other sample in the window adds a new value to the new sum: (Yx - (a*(x-1) + b))^2 instead of (Yx - (a*x + b))^2 .

Is there a known algorithm for calculating this? If not, can you think about it? (It is normal to have some errors due to first order linear approximations).

+4
source share
2 answers

If you expand the term (Yx - (a*x + b))^2 , the terms are broken into three parts:

  • The conditions are only a , x and b . They produce some constant when summed over n and can be ignored.
  • Conditions only Yx and b . They can be handled in the style of a boxcar integrator, as described by @Xion.
  • One term -2*Yx*a*x . -2*a is a constant, so ignore this part. Consider the partial sum S = Y1*1 + Y2*2 + Y3*3 ... Yn*n . Given Y1 and the current amount R = Y1 + Y2 + ... + Yn , you can find S - R , which eliminates Y1*1 and reduces each of the other members, leaving you with Y2*1 + Y3*2 + ... + Yn*(n-1) . Now update the current sum R as for (2), subtracting Y1 and adding Y(n+1) . Add the new term Yn*n to S

Now just add all these partial terms.

+1
source

Wouldn't a simple approach do the trick? ...

By β€œdirect” I mean maintaining a queue of samples. After a new sample arrives, you must:

  • enter the oldest sample from the queue
  • subtract its distance from your amount.
  • add a new sample to the queue
  • calculate its distance and add it to your amount.

As for time, everything here is O (1), if the queue is implemented as a linked list or something similar, you will also want to keep the distance with your samples in the queue, so you only calculate it once. Thus, the memory usage is 3 floats per sample - O (n).

+2
source

Source: https://habr.com/ru/post/1389565/


All Articles