Little Hardcore: Do you know any parallel modified moving average algorithm?

Do you know any parallel modified moving average algorithm?

I want to quickly calculate a moving average, but not with sequential algorithms . I want to use parallel algorithms, but I still haven't found a solution.

The best algorithm I have found is a sequential algorithm with a modified moving average to measure computer performance :

new_avg = alfa(new_time, previous_time) * new_value + (1-alfa(new_time, previous_time)) * previous_avg alfa(new_time, previous_time) = 1- exp(-(new_time - previous_time)/moving_period) 

Some other algorithms are good, but I have not found parallel algorithms .

This is a difficult question and I need help.

Consider that I want counting events to come in a random order of time - early events may appear later, later events - you could assume that an early event might be skipped / become outdated after processing late events (or with some timeout). Do not assume a sequential time order of events and that an event from the same time will arrive with the same time .


I do not want to use any algorithm that requires the memorization of many samples (especially all), it should remember only the time and the previous average value, perhaps some additional value, but not all or the same samples. Consider that an algorithm may make some minor errors not necessarily perfect, if the reason is performance improvement.

It will be very good if he uses fragments, but not necessarily.

+4
source share
2 answers

A moving average where events arrive in sequence can be performed as follows:

 newMovingAverage = ((MovingAverage * (n - 1)) + newSample) / n 

where n dictates how large (or small) the effect this sample has on the moving average. The larger n , the less influence. Over time, older samples will have less and less impact on the moving average as new samples arrive.

When passing samples from a sequence, you can try to reproduce this behavior by letting the age of the sample determine what effect it should have on the moving average. This could be, for example, follow these steps:

 influence = (1 + sampleAge)^2 * n newMovingAverage = ((MovingAverage * (influence - 1)) + newSample) / influence 

Where can I sampleAge indicate how much newSample should affect the moving average.

+5
source

The possibility of having a parallel algorithm will depend on the nature of the moving average that you are using.

The algorithm that you show in your question is exponentially smooth. Thus, the first data value affects each calculated average value. The amount of influence that the first value decreased with each new data point, but even the last average value in the sequence will slightly depend on the first data point.

This type of moving average cannot be parallel, because you cannot calculate the average without using (explicitly or implicitly) all the previous data that was received.

However, the Wikipedia article on moving averages summarizes a number of moving average methods, some of which are easily implemented in parallel.

For example, a simple moving average takes the following form (for odd n ) **:

 n2 = int(n/2) moving_average[i] = (data[i-n2] + data[i-n2+1] ... + data[i] + ... + data[i+n2-1] + data[i+n2])/n 

This method does not use any data earlier than int(n/2) points to i to calculate the moving average at point i . Therefore, you can calculate the moving average of a data set of elements m in parallel with flows p , dividing elements m into subsequences p , each of which overlaps the next and previous (except the first and last subsequences) subsequence of int(n/2) data points and each stream calculates moving average for their subsequence.

You can find an effective sequential implementation of this algorithm (which will be applicable to each thread of parallel implementation) in the question Simple summable sum / offset compensation> and its answer. This method calculates the final moving average, and not the (possibly preferred) centralized moving average, which I showed above. That is, it sets the value that I calculated above to moving_average[i+n2] instead of moving_average[i] .

** This leaves aside the possibility that data may be at irregular time intervals. The method you pointed out addresses this issue, and in other methods it can be handled in a similar way.

+4
source

Source: https://habr.com/ru/post/1479608/


All Articles