Speeding up the calculation of WMA (weighted moving average)

Question

Speeding up the calculation of WMA (weighted moving average)

I am trying to calculate the exponential moving average for 15-day bars, but I want to see the "evolution" of the 15-day EMA bar for each (end) day / bar. So that means I have 15 day bars. When new data arrives daily, I would like to recalculate the EMA using the new information. In fact, I have 15-day bars, and after each day my new 15-day bar starts to grow, and every new bar that comes in should be used to calculate the EMA along with the previous full 15-day bars.

Suppose we start from 2012-01-01 (we have data for each calendar day for this example), at the end of 2012-01-15 we have the first full 15-day bar. After 4 completed full 15 day bars in 2012-03-01, we can begin to calculate 4 EMA bars (EMA (x, n = 4)). At the end of 2012-03-02, we use the information available up to this point and calculate the EMA in 2012-03-02, pretending that OHLC for 2012-03-02 is a 15-day bar. Therefore, we take 4 full bars and the bar in 2012-03-02 and calculate the EMA (x, n = 4). Then we will wait another day, see what happened to the new 15-day bar (see the to.period.cumulative function below for more details) and calculate the new value for the EMA ... And so for the next 15 days and more ... See the EMA.cumulative function below for details ...

Below you will find what I have been able to pick up so far. Performance is unacceptable to me, and I cannot do it faster with my limited knowledge of R.

library(quantmod) do.call.rbind <- function(lst) { while(length(lst) > 1) { idxlst <- seq(from=1, to=length(lst), by=2) lst <- lapply(idxlst, function(i) { if(i==length(lst)) { return(lst[[i]]) } return(rbind(lst[[i]], lst[[i+1]])) }) } lst[[1]] } to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) { if(is.null(name)) name <- deparse(substitute(x)) cnames <- c("Open", "High", "Low", "Close") if (has.Vo(x)) cnames <- c(cnames, "Volume") cnames <- paste(name, cnames, sep=".") if (quantmod:::is.OHLCV(x)) { x <- OHLCV(x) out <- do.call.rbind( lapply(split(x, f=period, k=numPeriods), function(x) cbind(rep(first(x[,1]), NROW(x[,1])), cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5])))) } else if (quantmod:::is.OHLC(x)) { x <- OHLC(x) out <- do.call.rbind( lapply(split(x, f=period, k=numPeriods), function(x) cbind(rep(first(x[,1]), NROW(x[,1])), cummax(x[,2]), cummin(x[,3]), x[,4]))) } else { stop("Object does not have OHLC(V).") } colnames(out) <- cnames return(out) } EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) { barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period, k=numPeriods)]) # TODO: This is sloooooooooooooooooow... outEMA <- do.call.rbind( lapply(split(Cl(cumulativeBars), period), function(x) { previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ] if (NROW(previousFullBars) >= (nEMA - 1)) { last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA)) } else { xts(NA, order.by=index(x)) } })) colnames(outEMA) <- paste("EMA", nEMA, sep="") return(outEMA) } getSymbols("SPY", from="2010-01-01") SPY.cumulative <- to.period.cumulative(SPY, , name="SPY") system.time( SPY.EMA <- EMA.cumulative(SPY.cumulative) )

My system requires

  user system elapsed 4.708 0.000 4.410

Allowable runtime will be less than one second ... Can this be achieved with pure R?

This post is related to Optimize Moving Average Computation - is this possible? where I did not get any answers. Now I was able to create a reproducible example with a more detailed explanation of what I want to speed up. Hopefully this question now makes more sense.

Any ideas on how to speed it up are much appreciated.

+4

r finance

Samo Jan 03 '12 at 23:20

source share

1 answer

Samo · Accepted Answer · 2012-01-15T21:36:08+0000

I did not find a satisfactory solution for my question using R. So, I took the old tool, c language and results are better than I expected. Thanks for pushing me using these wonderful tools Rcpp, inline, etc. Amazing I think whenever I have performance requirements in the future and cannot be satisfied using R, I will add C to R, and the performance is there. So see below for my code and resolving performance issues.

 # How to speedup cumulative EMA calculation # ############################################################################### library(quantmod) library(Rcpp) library(inline) library(rbenchmark) do.call.rbind <- function(lst) { while(length(lst) > 1) { idxlst <- seq(from=1, to=length(lst), by=2) lst <- lapply(idxlst, function(i) { if(i==length(lst)) { return(lst[[i]]) } return(rbind(lst[[i]], lst[[i+1]])) }) } lst[[1]] } to.period.cumulative <- function(x, name=NULL, period="days", numPeriods=15) { if(is.null(name)) name <- deparse(substitute(x)) cnames <- c("Open", "High", "Low", "Close") if (has.Vo(x)) cnames <- c(cnames, "Volume") cnames <- paste(name, cnames, sep=".") if (quantmod:::is.OHLCV(x)) { x <- quantmod:::OHLCV(x) out <- do.call.rbind( lapply(split(x, f=period, k=numPeriods), function(x) cbind(rep(first(x[,1]), NROW(x[,1])), cummax(x[,2]), cummin(x[,3]), x[,4], cumsum(x[,5])))) } else if (quantmod:::is.OHLC(x)) { x <- OHLC(x) out <- do.call.rbind( lapply(split(x, f=period, k=numPeriods), function(x) cbind(rep(first(x[,1]), NROW(x[,1])), cummax(x[,2]), cummin(x[,3]), x[,4]))) } else { stop("Object does not have OHLC(V).") } colnames(out) <- cnames return(out) } EMA.cumulative<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) { barsEndptCl <- Cl(cumulativeBars[endpoints(cumulativeBars, on=period, k=numPeriods)]) # TODO: This is sloooooooooooooooooow... outEMA <- do.call.rbind( lapply(split(Cl(cumulativeBars), period), function(x) { previousFullBars <- barsEndptCl[index(barsEndptCl) < last(index(x)), ] if (NROW(previousFullBars) >= (nEMA - 1)) { last(EMA(last(rbind(previousFullBars, x), n=(nEMA + 1)), n=nEMA)) } else { xts(NA, order.by=index(x)) } })) colnames(outEMA) <- paste("EMA", nEMA, sep="") return(outEMA) } EMA.cccode <- ' /* Initalize loop and PROTECT counters */ int i, P=0; /* ensure that cumbars and fullbarsrep is double */ if(TYPEOF(cumbars) != REALSXP) { PROTECT(cumbars = coerceVector(cumbars, REALSXP)); P++; } /* Pointers to function arguments */ double *d_cumbars = REAL(cumbars); int i_nper = asInteger(nperiod); int i_n = asInteger(n); double d_ratio = asReal(ratio); /* Input object length */ int nr = nrows(cumbars); /* Initalize result R object */ SEXP result; PROTECT(result = allocVector(REALSXP,nr)); P++; double *d_result = REAL(result); /* Find first non-NA input value */ int beg = i_n*i_nper - 1; d_result[beg] = 0; for(i = 0; i <= beg; i++) { /* Account for leading NAs in input */ if(ISNA(d_cumbars[i])) { d_result[i] = NA_REAL; beg++; d_result[beg] = 0; continue; } /* Set leading NAs in output */ if(i < beg) { d_result[i] = NA_REAL; } /* Raw mean to start EMA - but only on full bars*/ if ((i != 0) && (i%i_nper == (i_nper - 1))) { d_result[beg] += d_cumbars[i] / i_n; } } /* Loop over non-NA input values */ int i_lookback = 0; for(i = beg+1; i < nr; i++) { i_lookback = i%i_nper; if (i_lookback == 0) { i_lookback = 1; } /*Previous result should be based only on full bars*/ d_result[i] = d_cumbars[i] * d_ratio + d_result[i-i_lookback] * (1-d_ratio); } /* UNPROTECT R objects and return result */ UNPROTECT(P); return(result); ' EMA.cc <- cfunction(signature(cumbars="numeric", nperiod="numeric", n="numeric", ratio="numeric"), EMA.cccode) EMA.cumulative.c<-function(cumulativeBars, nEMA = 4, period="days", numPeriods=15) { ratio <- 2/(nEMA+1) outEMA <- EMA.cc(cumbars=Cl(cumulativeBars), nperiod=numPeriods, n=nEMA, ratio=ratio) outEMA <- reclass(outEMA, Cl(cumulativeBars)) colnames(outEMA) <- paste("EMA", nEMA, sep="") return(outEMA) } getSymbols("SPY", from="2010-01-01") SPY.cumulative <- to.period.cumulative(SPY, name="SPY") system.time( SPY.EMA <- EMA.cumulative(SPY.cumulative) ) system.time( SPY.EMA.c <- EMA.cumulative.c(SPY.cumulative) ) res <- benchmark(EMA.cumulative(SPY.cumulative), EMA.cumulative.c(SPY.cumulative), columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), order="relative", replications=10) print(res)

EDIT: give an indication of a performance improvement over my bulky one (I'm sure this can be done better since I actually created a double loop) R here is a listing:

 > print(res) test replications elapsed relative user.self 2 EMA.cumulative.c(SPY.cumulative) 10 0.026 1.000 0.024 1 EMA.cumulative(SPY.cumulative) 10 57.732 2220.462 56.755

So, by my standards, the improvement of the SF type ...

Speeding up the calculation of WMA (weighted moving average)

More articles: