What makes rollmean faster than rollapply (by code)?

I regularly find current time series (in particular, tools) and I was surprised to find that rollmean noticeably faster than rollapply , and that align = 'right' methods are faster than rollmeanr wrappers.

How did they reach this speed? And why do you lose some of this when using the rollmeanr() wrapper?

Some background: I used rollapplyr(x, n, function(X) mean(X)) , however recently I have done some examples using rollmean . The docs suggest rollapplyr(x, n, mean) (note without the function part of the argument) uses rollmean , so I didn't think there would be a big difference in performance, however rbenchmark found noticeable differences.

 require(zoo) require(rbenchmark) x <- rnorm(1e4) r1 <- function() rollapplyr(x, 3, mean) # uses rollmean r2 <- function() rollapplyr(x, 3, function(x) mean(x)) r3 <- function() rollmean(x, 3, na.pad = TRUE, align = 'right') r4 <- function() rollmeanr(x, 3, align = "right") bb <- benchmark(r1(), r2(), r3(), r4(), columns = c('test', 'elapsed', 'relative'), replications = 100, order = 'elapsed') print(bb) 

I was surprised to find that rollmean(x, n, align = 'right') was noticeably faster - and ~ 40 times faster than my approach rollapply(x, n, function(X) mean(X)) .

  test elapsed relative 3 r3() 0.74 1.000 4 r4() 0.86 1.162 1 r1() 0.98 1.324 2 r2() 27.53 37.203 

The difference seems to increase as the size of the dataset grows. I only resized x (to rnorm(1e5) ) in the above code and repeated the test, and there was an even bigger difference between these functions.

  test elapsed relative 3 r3() 13.33 1.000 4 r4() 17.43 1.308 1 r1() 19.83 1.488 2 r2() 279.47 20.965 

and for x <- rnorm(1e6)

  test elapsed relative 3 r3() 44.23 1.000 4 r4() 54.30 1.228 1 r1() 65.30 1.476 2 r2() 2473.35 55.920 

How did they do it? In addition, is this the best solution? Sure, it's fast, but is there an even faster way to do this?

(Note: in general, my time series are almost always xts objects - is this important?)

+6
source share
1 answer

The calculation of the moving average is faster than the calculation of the total rolling function, since the first one is easier to calculate. When calculating the general rolling function, you need to calculate the function in each window again and again, which you do not need to do for mean , because of the simple identifier:

  (a2 + a3 + ... + an)/(n-1) = (a1 + a2 + ... + a(n-1))/(n-1) + (an - a1)/(n-1) 

and you can see how this can be used when viewing getAnywhere(rollmean.zoo) .

If you want an even faster moving average, use the runmean from caTools , which is implemented in C, making it much faster (it also scales much better, so it will become even faster as the data size increases).

 library(microbenchmark) library(caTools) library(zoo) x = rnorm(1e4) microbenchmark(runmean(x, 3, endrule = 'trim', align = 'right'), rollmean(x, 3, align = 'right')) #Unit: microseconds # expr min lq median uq max neval # runmean(x, 3, endrule = "trim", align = "right") 631.061 740.0775 847.5915 1020.048 1652.109 100 # rollmean(x, 3, align = "right") 7308.947 9155.7155 10627.0210 12760.439 16919.092 100 
+7
source

Source: https://habr.com/ru/post/951256/


All Articles