I have two data.tables: an experiment data x
table and a category lookup table dict
.
library(data.table)
set.seed(123)
x = data.table(samp=c(1,1,2,3,3,3,4,5,5,5,6,7,7,7,8,9,9,10,10), y=rnorm(19))
x
samp y
dict = data.table(samp=c(1:5, 4:8, 7:10), cat=c(rep(1,length(1:5)), rep(2,length(4:8)), rep(3,length(7:10))))
dict
For each, samp
I need to first calculate the product of all y
associated with it. Then I need to calculate the sum of these products for each sample category indicated in dict$cat
. Please note that each samp
displays more than one dict$cat
.
One way to do this - merge x
and dict
immediately, allowing the duplication of lines ( allow.cartesian=T
):
setkey(dict, samp)
setkey(x, samp)
step0 = dict[x, allow.cartesian=T]
setkey(step0, samp, cat)
step1 = step0[, list(prodY=prod(y)[1], cat=cat[1]), by=c("samp", "cat")]
resMet1 = step1[, sum(prodY), by="cat"]
, . - , x
, ( ?). , , .
dict$cat
x
. , , :
setkey(x, samp)
setkey(dict,samp)
pool = vector("list")
for(n in unique(dict$cat)){
thisCat = x[J(dict[cat==n])]
setkey(thisCat, samp)
step1 = thisCat[, list(prodY=prod(y)[1], cat=cat[1]), by="samp"]
pool[[n]] = step1[, sum(prodY), by="cat"]
}
resMet2 = rbindlist(pool)
, , . , - data.table
J()
?