Like [join on] [where] in r data.tables

Is there a way to combine join functions and subsets in data.table? Let's say I have the following table:

dt = data.table(itemID = c(1,1,2,2),bucketID = c(1,2,2,3),value = 1:4)

I want to set the value to zero for the youngest bucket of each item. My thought was to run:

ends = dt[,.(min = min(bucketID)),itemID]
dt[ends,on="itemID",bucketID==min,value:=0]

i.e. join the tables, find where the two rows are identical, and then update the value column. But that does not work. I can get the correct results with:

ends = dt[,.(min = min(bucketID)),itemID]
dt = dt[ends,on="itemID"][bucketID==min,value:=0][,c(-4)]

However, this seems a bit workaround. Is there a better way to combine a connection and where?

+4
source share
1 answer

Extending your approach to joining, you can join values itemIDandmin

dt[
    ends
    , on = c("itemID", bucketID = "min")
    , value := 0
]

dt
#    itemID bucketID value
# 1:      1        1     0
# 2:      1        2     2
# 3:      2        2     0
# 4:      2        3     4
+5
source

Source: https://habr.com/ru/post/1692946/


All Articles