Using data.table to speed up rollapply

I am new to data.tables, so I apologize if this is a very simple question.

I heard that data.tables significantly improves computational times when working with large amounts of data and therefore would like to see if data.table can help speed up the rollapply function.

if we have some one-dimensional data

xts.obj <- xts(rnorm(1e6), order.by=as.POSIXct(Sys.time()-1e6:1), tz="GMT") colnames(xts.obj) <- "rtns" 

a simple rolling quantile with a width of 100 and p 0.75 takes an amazingly long time ...

i.e. line of code

 xts.obj$quant.75 <- rollapply(xts.obj$rtns,width=100, FUN='quantile', p=0.75) 

it seems forever ...

is there anything data.table can do to speed things up? those. Is there a general roll function that can be applied?

maybe a program to convert an xts object to a data.table object to execute a function in an accelerated manner, and then return to xts again at the end?

early

hlm

ps It seems that I did not receive a response to the data.table mailing list, so I am posting here to see if it works out better.

pps with a quick jump with another example using data frames, the data.table solution seems to take longer than the rollapply function, i.e. shown below:

 > x <- data.frame(x=rnorm(10000)) > x.dt <- data.table(x) > system.time(l1 <- as.numeric(rollapply(x,width=10,FUN=quantile,p=0.75))) user system elapsed 2.69 0.00 2.68 > system.time(l <- as.numeric(unlist(x.dt[,lapply(1:((nrow(x.dt))-10+1), function(i){ x.dt[i:(i+10-1),quantile(x,p=0.75)]})]))) user system elapsed 11.22 0.00 11.51 > identical(l,l1) [1] TRUE 
+6
r xts dataframe data.table apply
Aug 27 '12 at 22:27
source share
2 answers

The datatable is completely irrelevant here - you are actually using sapply for the vector, that is, pretty much the fastest operation you can get (other than switching to C). data frames and data tables will always be slower than vectors. You can get a little using a direct vector (without sending xts), but the only easy way to do this quickly is to parallelize:

 > x = as.vector(xts.obj$rtns) > system.time(unclass(mclapply(1:(length(x) - 99), function(i) quantile(x[i:(i + 99)], p=0.75), mc.cores=32))) user system elapsed 325.481 15.533 11.221 

If you need it even faster, you can write a specialized function: the naive approach method re-sorts each fragment, which is obviously wasteful - all you have to do is reset one element and sort it in the next one to get a quantile to you could expect about 50x speedup if you did, but you would have to code them yourself (so it's worth it if you use it more often ...).

+7
Aug 28 '12 at 3:26
source share

data.table is fast, breaking data with a key. I don't think that data.table currently supports a key or expression, which in by or i arguments will do this.

You can use the fact that a subset is faster than data.table than data.frame

 DT <- as.data.table(x) .x <- 1:(nrow(DT)-9) system.time(.xl <- unlist(lapply(.x, function(.i) DT[.i:(.i+10),quantile(x,0.75, na.rm = T)]))) user system elapsed 8.77 0.00 8.77 

Or you can create key variables that uniquely identify current identifiers. Width = 10, so we need 10 columns (with the addition of NA_real_ )

 library(plyr) # for as.quoted .j <- paste0('x',1:10, ':= c(rep(NA_real_,',0:9,'),rep(seq(',1:10,',9991,by=10),each=10), rep(NA_real_,',c(0,9:1),'))') datatable <- function(){ invisible(lapply(.j, function(.jc) x.dt[,eval(as.quoted(.jc)[[1]])])) x_roll <- rbind(x.dt[!is.na(x1),quantile(x,0.75),by=x1], x.dt[!is.na(x2),quantile(x,0.75),by=x2], x.dt[!is.na(x3),quantile(x,0.75),by=x3], x.dt[!is.na(x4),quantile(x,0.75),by=x4], x.dt[!is.na(x5),quantile(x,0.75),by=x5], x.dt[!is.na(x6),quantile(x,0.75),by=x6], x.dt[!is.na(x7),quantile(x,0.75),by=x7], x.dt[!is.na(x8),quantile(x,0.75),by=x8], x.dt[!is.na(x9),quantile(x,0.75),by=x9], x.dt[!is.na(x10),quantile(x,0.75),by=x10],use.names =F) setkeyv(x_roll,'x1') invisible(x.dt[,x1:= 1:10000]) setkeyv(x.dt,'x1') x_roll[x.dt][, list(x,V1)]} l1 <- function()as.numeric(rollapply(x,width=10,FUN=quantile,p=0.75)) lapply_only <- function() unclass(lapply(1:(nrow(x) - 9), function(i) quantile(x[['x']][i:(i + 9)], p=0.75))) benchmark(datatable(),l1(),lapply_only(), replications = 5) ## test replications elapsed relative user.self ## 1 datatable() 5 9.41 1.000000 9.40 ## 2 l1() 5 10.97 1.165781 10.85 ## 3 lapply_only() 5 10.39 1.104145 10.35 

EDIT --- benchmarking

data.table faster than rollapply and raw. I can not check the parallel solution.

+5
Aug 28 2018-12-12T00:
source share



All Articles