I want to estimate the moving risk value for a data set of about 22.5 million cases, so I want to use sparklyr for quick calculation. Here is what I did (using an example database):
library(PerformanceAnalytics) library(reshape2) library(dplyr) data(managers) data <- zerofill(managers) data<-as.data.frame(data) class(data) data$date=row.names(data) lmanagers<-melt(data, id.vars=c('date'))
Now I am evaluating VaR with dplyr and PerformanceAnalytics packages:
library(zoo) # for rollapply() var <- lmanagers %>% group_by(variable) %>% arrange(variable,date) %>% mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T))
It works great. Now I am doing this to use sparklyr:
library(sparklyr) sc <- spark_connect(master = "local") lmanagers_sp <- copy_to(sc,lmanagers) src_tbls(sc) var_sp <- lmanagers_sp %>% group_by(variable) %>% arrange(variable,date) %>% mutate(var=rollapply(value, 10,FUN=function(x) VaR(x, p=.95, method="modified",align = "right"), partial=T)) %>% collect
But this gives the following error:
Error: Unknown input type: pairlist
Can someone tell me where the error is and what is the correct code? Or any other solution for evaluating the VaR rolling speed is also appreciated.