Find moving averages of any length below the threshold

Question

Find moving averages of any length below the threshold

I want to find all runs in a data vector where the average value is below some threshold value. For instance. for data set

d <- c(0.16, 0.24, 0.15, 0.17, 0.37, 0.14, 0.12, 0.08)

If I wanted to find all runs with an average value equal to or equal to 0.20, zero index 1-6 would not be identified (average 0.205), but 1-7 (average 0.193) would be .. compared to others .

To make things simpler, I don’t need subsets of runs where it has already been determined that the average is below the threshold. That is, after the example, I would not need to check the start of 1-6, if I already knew that 1-7 is below the threshold. But I still need to check out other runs that include starting 1-7 and are not a subset of it (e.g. 2-8).

In an attempt to answer this question, I see that I could start with something similar to this , for example.

hour <- c(1, 2, 3, 4, 5, 6, 7, 8)
value <- c(0.16, 0.24, 0.15, 0.17, 0.37, 0.14, 0.12, 0.08)
d <- data.frame(hour, value)

rng <- rev(1:length(d$value))

data.table::setDT(d)[, paste0('MA', rng) := lapply(rng, function(x) 
    zoo::rollmeanr(value, x, fill = NA))][]

And then search through all the generated columns for the values below the threshold.

But this method is not very effective for what I want to achieve (it looks through all the subsets of runs that are already defined under the threshold), and does not cope with large data sets (which means about 500 thousand records ... then I will have a matrix 500k x 500k).

Instead, it would be sufficient to write the mileage indices under the threshold in a separate variable. This will at least allow you to create a 500k x 500k matrix. But I'm not sure how to check if the output is rollmeanr()below the value and, if so, get the corresponding indexes.

+4

r

Manselpotamus Jun 26 '17 at 16:07

1

Scarabee · Accepted Answer · 2017-06-27T12:41:14+0000

-, , mean(x) <= threshold , sum(x - threshold) <= 0.

-, d , c(0, cumsum(d)) , .

:

s <- c(0, cumsum(d - threshold))

# potential start points of *maximal* runs:
B <- which(!duplicated(cummax(s)))
# potential end points:
E <- which(!duplicated(rev(cummin(rev(s))), fromLast = TRUE))

# end point associated with each start point
# (= for each point of B, we find the *last* point of E which is smaller)
E2 <- E[findInterval(s[B], s[E])] - 1

# potential maximal runs:
df <- data.frame(begin = B, end = E2)

# now we just have to filter out lines with begin > end, and keep only the 
# first begin for each end - for instance using dplyr:
df %>%
  filter(begin <= end) %>%
  group_by(end) %>%
  summarise(begin = min(begin))

Find moving averages of any length below the threshold

More articles: