I want to find all runs in a data vector where the average value is below some threshold value. For instance. for data set
d <- c(0.16, 0.24, 0.15, 0.17, 0.37, 0.14, 0.12, 0.08)
If I wanted to find all runs with an average value equal to or equal to 0.20, zero index 1-6 would not be identified (average 0.205), but 1-7 (average 0.193) would be .. compared to others .
To make things simpler, I don’t need subsets of runs where it has already been determined that the average is below the threshold. That is, after the example, I would not need to check the start of 1-6, if I already knew that 1-7 is below the threshold. But I still need to check out other runs that include starting 1-7 and are not a subset of it (e.g. 2-8).
In an attempt to answer this question, I see that I could start with something similar to this , for example.
hour <- c(1, 2, 3, 4, 5, 6, 7, 8)
value <- c(0.16, 0.24, 0.15, 0.17, 0.37, 0.14, 0.12, 0.08)
d <- data.frame(hour, value)
rng <- rev(1:length(d$value))
data.table::setDT(d)[, paste0('MA', rng) := lapply(rng, function(x)
zoo::rollmeanr(value, x, fill = NA))][]
And then search through all the generated columns for the values below the threshold.
But this method is not very effective for what I want to achieve (it looks through all the subsets of runs that are already defined under the threshold), and does not cope with large data sets (which means about 500 thousand records ... then I will have a matrix 500k x 500k).
Instead, it would be sufficient to write the mileage indices under the threshold in a separate variable. This will at least allow you to create a 500k x 500k matrix. But I'm not sure how to check if the output is rollmeanr()below the value and, if so, get the corresponding indexes.