I am on data.table 1.9.3, and maybe I'm wrong, but I donβt remember as expected earlier.
I am building 2 data.tables, dta and dtb
> dta idx vala fdx 1: 1 2 a 2: 2 4 a 3: 3 6 b > dtb idx valb 1: 1 3 2: 4 6 > dput(x = dta) structure(list(idx = c(1, 2, 3), vala = c(2, 4, 6), fdx = c("a", "a", "b")), .Names = c("idx", "vala", "fdx"), row.names = c(NA, -3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000110788>, sorted = "idx") > dput(x = dtb) structure(list(idx = c(1, 4), valb = c(3, 6)), .Names = c("idx", "valb"), row.names = c(NA, -2L), class = c("data.table", "data.frame" ), .internal.selfref = <pointer: 0x0000000000110788>, sorted = "idx")
In both cases, the idx key.
Following work of course
> dta[dtb, sum(valb)] [1] 9
However it is not
> dta[dtb, sum(valb), by = fdx] Error in `[.data.table`(dta, dtb, sum(valb), by = fdx) : object 'valb' not found
But it does
> dta[dtb][, sum(valb), by = fdx] fdx V1 1: a 3 2: NA 6
If we see an intermediate step
> dta[dtb] idx vala fdx valb 1: 1 2 a 3 2: 4 NA NA 6
I would expect
dta[dtb, sum(valb), by = fdx] == dta[dtb][, sum(valb), by = fdx]
Where am I wrong?