R slide.table sliding window

What is the best (fastest) way to implement the sliding function of a window with the data.table package?

I am trying to calculate the moving median, but has a few lines per day (due to two additional factors), which I think means that the zap rollapply function will not work. Here is an example of using a naive loop:

library(data.table) df <- data.frame( id=30000, date=rep(as.IDate(as.IDate("2012-01-01")+0:29, origin="1970-01-01"), each=1000), factor1=rep(1:5, each=200), factor2=1:5, value=rnorm(30, 100, 10) ) dt = data.table(df) setkeyv(dt, c("date", "factor1", "factor2")) get_window <- function(date, factor1, factor2) { criteria <- data.table( date=as.IDate((date - 7):(date - 1), origin="1970-01-01"), factor1=as.integer(factor1), factor2=as.integer(factor2) ) return(dt[criteria][, value]) } output <- data.table(unique(dt[, list(date, factor1, factor2)]))[, window_median:=as.numeric(NA)] for(i in nrow(output):1) { print(i) output[i, window_median:=median(get_window(date, factor1, factor2))] } 
+43
r time-series data.table sliding-window
Jul 26 '12 at 19:15
source share
3 answers

data.table does not currently have special functions for rolling windows. More details here, in my answer to another similar question:

Is there a quick way to trigger sliding regression inside data.table?

Rolling median is interesting. For effective work, you will need a special function (the same link as in the previous comment):

Mobile median algorithm in C

The data.table solutions in the question and answers here are very inefficient with respect to the corresponding specialized rollingmedian function (which is not available for R afaik).

+6
Aug 31 '12 at 11:26
source share

I managed to get an example up to 1.4s by creating a lagging dataset and making a huge join.

 df <- data.frame( id=30000, date=rep(as.IDate(as.IDate("2012-01-01")+0:29, origin="1970-01-01"), each=1000), factor1=rep(1:5, each=200), factor2=1:5, value=rnorm(30, 100, 10) ) dt2 <- data.table(df) setkeyv(dt, c("date", "factor1", "factor2")) unique_set <- data.table(unique(dt[, list(original_date=date, factor1, factor2)])) output2 <- data.table() for(i in 1:7) { output2 <- rbind(output2, unique_set[, date:=original_date-i]) } setkeyv(output2, c("date", "factor1", "factor2")) output2 <- output2[dt] output2 <- output2[, median(value), by=c("original_date", "factor1", "factor2")] 

This works very well in this test dataset, but in reality it does not work with 8 GB of RAM. I will try to switch to one of the instances of High Memory EC2 (with 17, 34 or 68 GB of RAM) to make it work. Any ideas on how to do this with less memory would be appreciated.

+3
Aug 10 '12 at 15:00
source share

This solution works, but it takes some time.

 df <- data.frame( id=30000, date=rep(seq.Date(from=as.Date("2012-01-01"),to=as.Date("2012-01-30"),by="d"),each=1000), factor1=rep(1:5, each=200), factor2=1:5, value=rnorm(30, 100, 10) ) myFun <- function(dff,df){ median(df$value[df$date>as.Date(dff[2])-8 & df$date<as.Date(dff[2])-1 & df$factor1==dff[3] & df$factor2==dff[4]]) } week_Med <- apply(df,1,myFun,df=df) week_Med_df <- cbind(df,week_Med) 
0
Jul 27 2018-12-12T00:
source share



All Articles