Sliding window over time - data structure and garbage collection

I am trying to implement something in moving average lines.

This system does not guarantee the number of integers over a period of time. I need to calculate the average for each period. Therefore, I cannot simply glide over the list of integers by quantity, since that would not be relative to time.

I can record each value with its associated time. We will have a ton of data passing through the system, so itโ€™s important to โ€œgarbage collectโ€ the old data.

It may also be important to note that I need to save the average value to disk after the end of each period. However, they may partially overlap between storing data on disk and retrieving data from a new period.

What are some efficient data structures that I can use to store, slide and garbage collect this type of data?

+6
source share
1 answer

Description of the problem and conflict of questions: what is described is not a moving average, since the average value for each time period is different. ("I need to calculate the average for each period"). So this allows a really trivial solution:

For each period, maintain a count and a sum of observations. At the end of the period, compute the average 

I suspect that I really need something like: every second (calculation period), I want to know the average observation for the last minute (aggregation period).

This can be solved simply using a circular bucket buffer, each of which represents a value in one calculation period. There will be an aggregation period / computation period such buckets. Again, each bucket contains an invoice and an amount. In addition, the current amount / amount and the cumulative total amount / account are retained. Each observation is added to the current amount / amount.

At the end of each calculation period:

  • subtract the amount / account for the (circular) first period from the total amount / account
  • add current amount / account to total amount / account
  • report average based on total / bill
  • replace the values โ€‹โ€‹of the first period with the current amount / counter
  • clear current amount / counter
  • push the beginning of the circular buffer.

If you really need to be able to calculate at any time in all the average previous observations for a given period, you will need a more complex data structure, mostly an expandable circular buffer. However, such accurate calculations are rarely really necessary, and an approximation in accordance with the above algorithm is usually adequate for data purposes and much more stable in the long run for memory management, since its memory requirements are fixed from the very beginning.

+3
source

Source: https://habr.com/ru/post/955997/


All Articles