Multiple Column Reversal

I have a problem finding the most efficient way to calculate linear rolling regression on an xts multi-column object. I looked through and read some previously asked questions in stackoverflow.

This question and answer is approaching, but not enough, in my opinion, since I want to calculate multiple regressions with a dependent variable without changes in all regressions. I tried to reproduce an example with random data:

require(xts) require(RcppArmadillo) # Load libraries data <- matrix(sample(1:10000, 1500), 1500, 5, byrow = TRUE) # Random data data[1000:1500, 2] <- NA # insert NAs to make it more similar to true data data <- xts(data, order.by = as.Date(1:1500, origin = "2000-01-01")) NR <- nrow(data) # number of observations NC <- ncol(data) # number of factors obs <- 30 # required number of observations for rolling regression analysis info.names <- c("res", "coef") info <- array(NA, dim = c(NR, length(info.names), NC)) colnames(info) <- info.names 

An array is created to store several variables (residuals, coefficients, etc.) with time and coefficient.

 loop.begin.time <- Sys.time() for (j in 2:NC) { cat(paste("Processing residuals for factor:", j), "\n") for (i in obs:NR) { regression.temp <- fastLm(data[i:(i-(obs-1)), j] ~ data[i:(i-(obs-1)), 1]) residuals.temp <- regression.temp$residuals info[i, "res", j] <- round(residuals.temp[1] / sd(residuals.temp), 4) info[i, "coef", j] <- regression.temp$coefficients[2] } } loop.end.time <- Sys.time() print(loop.end.time - loop.begin.time) # prints the loop runtime 

As shown in the cycle, the idea is to each time 30 dependent variables (coefficients) perform calendar regression with data[, 1] depending on one of the other factors. I have to store 30 residues in a temporary object in order to standardize them, since fastLm does not calculate standardized residues.

The cycle is extremely slow and cumbersome, if the number of columns (factors) in the xts object increases to about 100-1000 columns, it will take forever. Hope you have more efficient code to create sliding regressions over a large dataset.

+5
r xts apply linear-regression rolling-computation
Aug 08 2018-12-12T00:
source share
1 answer

This should be pretty fast if you get to the math level of linear regression. If X is an independent variable and Y is a dependent variable. The coefficients are determined by the expression

Beta = inv(t(X) %*% X) %*% (t(X) %*% Y)

I'm a little confused about which variable you want to be dependent and which is independent, but hopefully solving a similar problem below will help you.

In the example below, I take 1000 variables instead of the original 5 and do not enter any NA.

 require(xts) data <- matrix(sample(1:10000, 1500000, replace=T), 1500, 1000, byrow = TRUE) # Random data data <- xts(data, order.by = as.Date(1:1500, origin = "2000-01-01")) NR <- nrow(data) # number of observations NC <- ncol(data) # number of factors obs <- 30 # required number of observations for rolling regression analysis 

Now we can calculate the odds using the Joshua TTR package.

 library(TTR) loop.begin.time <- Sys.time() in.dep.var <- data[,1] xx <- TTR::runSum(in.dep.var*in.dep.var, obs) coeffs <- do.call(cbind, lapply(data, function(z) { xy <- TTR::runSum(z * in.dep.var, obs) xy/xx })) loop.end.time <- Sys.time() print(loop.end.time - loop.begin.time) # prints the loop runtime 

Time difference 3.934461 sec.

 res.array = array(NA, dim=c(NC, NR, obs)) for(z in seq(obs)) { res.array[,,z] = coredata(data - lag.xts(coeffs, z-1) * as.numeric(in.dep.var)) } res.sd <- apply(res.array, c(1,2), function(z) z / sd(z)) 

If I have not made any errors in res.sd indexing, you should give standardized leftovers. Feel free to fix this solution to fix the errors.

+8
Aug 09 2018-12-12T00:
source share



All Articles