I regularly find current time series (in particular, tools) and I was surprised to find that rollmean noticeably faster than rollapply , and that align = 'right' methods are faster than rollmeanr wrappers.
How did they reach this speed? And why do you lose some of this when using the rollmeanr() wrapper?
Some background: I used rollapplyr(x, n, function(X) mean(X)) , however recently I have done some examples using rollmean . The docs suggest rollapplyr(x, n, mean) (note without the function part of the argument) uses rollmean , so I didn't think there would be a big difference in performance, however rbenchmark found noticeable differences.
require(zoo) require(rbenchmark) x <- rnorm(1e4) r1 <- function() rollapplyr(x, 3, mean) # uses rollmean r2 <- function() rollapplyr(x, 3, function(x) mean(x)) r3 <- function() rollmean(x, 3, na.pad = TRUE, align = 'right') r4 <- function() rollmeanr(x, 3, align = "right") bb <- benchmark(r1(), r2(), r3(), r4(), columns = c('test', 'elapsed', 'relative'), replications = 100, order = 'elapsed') print(bb)
I was surprised to find that rollmean(x, n, align = 'right') was noticeably faster - and ~ 40 times faster than my approach rollapply(x, n, function(X) mean(X)) .
test elapsed relative 3 r3() 0.74 1.000 4 r4() 0.86 1.162 1 r1() 0.98 1.324 2 r2() 27.53 37.203
The difference seems to increase as the size of the dataset grows. I only resized x (to rnorm(1e5) ) in the above code and repeated the test, and there was an even bigger difference between these functions.
test elapsed relative 3 r3() 13.33 1.000 4 r4() 17.43 1.308 1 r1() 19.83 1.488 2 r2() 279.47 20.965
and for x <- rnorm(1e6)
test elapsed relative 3 r3() 44.23 1.000 4 r4() 54.30 1.228 1 r1() 65.30 1.476 2 r2() 2473.35 55.920
How did they do it? In addition, is this the best solution? Sure, it's fast, but is there an even faster way to do this?
(Note: in general, my time series are almost always xts objects - is this important?)