Is there a way to combine join functions and subsets in data.table? Let's say I have the following table:
dt = data.table(itemID = c(1,1,2,2),bucketID = c(1,2,2,3),value = 1:4)
I want to set the value to zero for the youngest bucket of each item. My thought was to run:
ends = dt[,.(min = min(bucketID)),itemID]
dt[ends,on="itemID",bucketID==min,value:=0]
i.e. join the tables, find where the two rows are identical, and then update the value column. But that does not work. I can get the correct results with:
ends = dt[,.(min = min(bucketID)),itemID]
dt = dt[ends,on="itemID"][bucketID==min,value:=0][,c(-4)]
However, this seems a bit workaround. Is there a better way to combine a connection and where?
source
share