Join by and on to combine and create a pivot column for data.table

I have two datasets: one detailed weight dataset and the other, which should be a composite dataset. I am trying to create a composite dataset by joining a part and aggregation dataset, but it is not working properly.

Here is a sample code.

 mytesta <- data.table(cola = c("a","b"), groupa = c(1,2)) # summary mytestb <- data.table(groupa = c(1,1,1,1,2,2,2), weighta = c(10,20,30,25,15,30,10)) #detail 

And this is my desired result.

  cola groupa weighta 1: a 1 85 2: b 2 55 

What i tried to do

 mytesta[mytestb, on = "groupa", weight_summary := sum(i.weighta), by = "groupa"] 

The problem is that when by is used, the columns of the internal data.table disappear (for example, mytesta[mytestb, on = "groupa", .SD, by = "groupa"] ). Is there any way around this?

+5
source share
2 answers

I would do

 mytesta[, v := mytestb[.SD, on=.(groupa), sum(weighta), by=.EACHI]$V1 ] 

At join X[Y] we look at each row of Y in X

So, if the ultimate goal is to create a new column in Y calculated for each row, we need the connection Y[, v := X[Y, ...]] , although Y[X, v := ...] may seem more intuitive at first.

+2
source

Here is a solution in which I first combine your data data.tables and then summarize.

 tab = merge(mytesta, mytestb, by="groupa") tab # groupa cola weighta # 1: 1 a 10 # 2: 1 a 20 # 3: 1 a 30 # 4: 1 a 25 # 5: 2 b 15 # 6: 2 b 30 # 7: 2 b 10 res = tab[, list(weighta=sum(weighta)), by=list(cola, groupa)] res # cola groupa weighta # 1: a 1 85 # 2: b 2 55 
+1
source

Source: https://habr.com/ru/post/1265408/


All Articles