Join by and on to combine and create a pivot column for data.table

Question

Join by and on to combine and create a pivot column for data.table

I have two datasets: one detailed weight dataset and the other, which should be a composite dataset. I am trying to create a composite dataset by joining a part and aggregation dataset, but it is not working properly.

Here is a sample code.

 mytesta <- data.table(cola = c("a","b"), groupa = c(1,2)) # summary mytestb <- data.table(groupa = c(1,1,1,1,2,2,2), weighta = c(10,20,30,25,15,30,10)) #detail

And this is my desired result.

  cola groupa weighta 1: a 1 85 2: b 2 55

What i tried to do

 mytesta[mytestb, on = "groupa", weight_summary := sum(i.weighta), by = "groupa"]

The problem is that when by is used, the columns of the internal data.table disappear (for example, mytesta[mytestb, on = "groupa", .SD, by = "groupa"] ). Is there any way around this?

+5

r data.table

Naumz Mar 13 '17 at 22:45

source share

2 answers

Here is a solution in which I first combine your data data.tables and then summarize.

 tab = merge(mytesta, mytestb, by="groupa") tab # groupa cola weighta # 1: 1 a 10 # 2: 1 a 20 # 3: 1 a 30 # 4: 1 a 25 # 5: 2 b 15 # 6: 2 b 30 # 7: 2 b 10 res = tab[, list(weighta=sum(weighta)), by=list(cola, groupa)] res # cola groupa weighta # 1: a 1 85 # 2: b 2 55

+1

bdemarest Mar 14 '17 at 0:22

source share

Frank · Accepted Answer · 2017-03-14T17:42:02+0000

I would do

 mytesta[, v := mytestb[.SD, on=.(groupa), sum(weighta), by=.EACHI]$V1 ]

At join X[Y] we look at each row of Y in X

So, if the ultimate goal is to create a new column in Y calculated for each row, we need the connection Y[, v := X[Y, ...]] , although Y[X, v := ...] may seem more intuitive at first.

Join by and on to combine and create a pivot column for data.table

More articles: