Multiply two data.tables, keep all the features

I can not find a duplicate at the moment.

My problem is this:

I have two data.tables . One with two columns (featurea, count), the other with three columns (featureb, featurec, count). I want to multiply (?), So that I have a new data.table with all the features. The trick is that these functions do not match, so merge decisions may not perform the trick.

MRE as follows:

 # two columns DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3)) # featurea count #1: type1 2 #2: type2 3 #three columns DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2)) # origin color count #1: house red 2 #2: park blue 1 #3: park red 2 

My expected result in this case is data.table as follows:

 > DT3 origin color featurea total 1: house red type1 4 2: house red type2 6 3: park blue type1 2 4: park blue type2 3 5: park red type1 4 6: park red type2 6 
+6
source share
3 answers

This will be one way. First, I expanded the lines in DT2 with expandRows() in the splitstackshape package. Each line is repeated twice since I specified count = 2, count.is.col = FALSE . Then I took care of the multiplication and created a new column called total . At the same time, I created a new column for featurea . Finally, I reset count .

 library(data.table) library(splitstackshape) expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[, `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL] 

EDIT

If you do not want to add another package, you can try David's idea in your comment.

 DT2[rep(1:.N, nrow(DT1))][, `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][] # origin color total featurea #1: house red 4 type1 #2: house red 6 type2 #3: park blue 2 type1 #4: park blue 3 type2 #5: park red 4 type1 #6: park red 6 type2 
+6
source

Test larger data, I'm not sure how optimized it is:

 DT2[, .(featurea = DT1[["featurea"]], count = count * DT1[["count"]]), by = .(origin, color)] # origin color featurea count #1: house red type1 4 #2: house red type2 6 #3: park blue type1 2 #4: park blue type2 3 #5: park red type1 4 #6: park red type2 6 

It might be better to switch it if DT1 has fewer groups:

 DT1[, c(DT2[, .(origin, color)], .(count = count * DT2[["count"]])), by = featurea] # featurea origin color count #1: type1 house red 4 #2: type1 park blue 2 #3: type1 park red 4 #4: type2 house red 6 #5: type2 park blue 3 #6: type2 park red 6 
+8
source

With dplyr solution

 library(dplyr) library(data.table) DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3)) DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2)) 

Create a dummy column for the inner join (for me its key ):

 inner_join(DT1 %>% mutate(key=1), DT2 %>% mutate(key=1), by="key") %>% mutate(total=count.x*count.y) %>% select(origin, color, featurea, total) %>% arrange(origin, color) 
0
source

Source: https://habr.com/ru/post/1013387/


All Articles