Given data.table , I would like to quickly pick up the elements in it. For instance:
dt = data.table(a=1:10, key="a") dt[a > 3 & a <= 7]
This is pretty slow. I know I can do joins to get individual rows, but is there a way to sort data.table to get fast subsets of this type?
This is what I do:
dt1 = data.table(id = 1, ym = c(199001, 199006, 199009, 199012), last_ym = c(NA, 199001, 199006, 199009), v = 1:4, key=c("id", "ym")) dt2 = data.table(id = 1, ym = c(199001, 199002, 199003, 199004, 199005, 199006, 199007, 199008, 199009, 199010, 199011, 199012), v2 = 1:12, key=c("id","ym"))
For each id there are only 1 and ym in dt1 , I would like to sum the values ββof v2 between the current ym in dt1 and the last ym in dt1 . That is, for ym == 199006 in dt1 I would like to return list(v2 = 2 + 3 + 4 + 5 + 6) . These are the v2 values ββin dt2 that are equal to or less than the current ym (excluding the previous ym). In code:
expr = expression({ #browser(); cur_id = id; cur_ym = ym; cur_dtb = dt2[J(cur_id)][ym <= cur_ym & ym > last_ym]; setkey(cur_dtb , ym); list(r = sum(cur_dtb$v2)) }) dt1[,eval(expr ),by=list(id, ym)]