Consider unique column values ​​in pairs by combinations of another column and group by the third column in R

Quite a challenge, to be honest. This is basically an extension of the question I asked earlier - Count unique column values ​​in pairs by combinations of another column in R

Let's say this time I have the following data frame in R:

data.frame(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A")) 

The data is as follows:

  Reg.ID Location Product 1 1 XA 2 1 XB 3 2 YA 4 2 YB 5 2 YC 6 3 XB 7 3 XA 

I would like to count the unique values ​​of the Reg.ID column by pairwise combinations of the values ​​in the Product column, grouped by the Location column. The result should look like this:

  Location Prod.Comb Count 1 XA,B 2 2 YA,B 1 3 YA,C 1 4 YB,C 1 

I tried to get the result using the basic R functions, but did not get any success. I assume there is a fairly simple solution using the data.table package in R?

Any help would be greatly appreciated. Thanks!

+5
source share
2 answers

Not a lot of tested idea, but this is what comes to mind first with data.table :

 library(data.table) dt <- data.table(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A")) dt.cj <- merge(dt, dt, by ="Location", all = T, allow.cartesian = T) dt.res <- dt.cj[Product.x < Product.y, .(cnt = length(unique(Reg.ID.x))),by = .(Location, Product.x, Product.y)] # Location Product.x Product.y cnt # 1: XAB 2 # 2: YAB 1 # 3: YAC 1 # 4: YBC 1 
+6
source

A dplyr , plagiarizing from the question you mentioned:

 library(dplyr) df <- data.frame(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"), stringsAsFactors = FALSE) df %>% full_join(df, by="Location") %>% filter(Product.x < Product.y) %>% group_by(Location, Product.x, Product.y) %>% summarise(Count = length(unique(Reg.ID.x))) %>% mutate(Prod.Comb = paste(Product.x, Product.y, sep=",")) %>% ungroup %>% select(Location, Prod.Comb, Count) %>% arrange(Location, Prod.Comb) # # A tibble: 4 Γ— 3 # Location Prod.Comb Count # <chr> <chr> <int> # 1 XA,B 2 # 2 YA,B 1 # 3 YA,C 1 # 4 YB,C 1 
+2
source

Source: https://habr.com/ru/post/1264186/


All Articles