Recursive grouping in R

I am trying to find a way to create sequential Group_IDs based on "overlapping" variables. The easiest way to describe this is to use the example of a home, loan, and borrower.

Suppose we have the following example

df <- data.frame(house     = c('H_01','H_02','H_03','H_04','H_05'),
                 loan      = c('L_01','L_02','L_02','L_03','L_04'),
                 borrower  = c('B_01','B_01','B_02','B_03','B_04'))

Suppose everyone would have a lot of relationships between all the variables (house, loan, borrower). So, for example, House 1 ( H_01) is associated with Loan 1 ( L_01) and Borrower 1 ( B_01). But B_01it is also related to L_02, which itself is connected to H_02, but also related to H_03, so the first 3 rows in my table should be marked G_01(for group 1).

H_04is associated with L_03which does not belong to any other record, and none B_03relates to any other record, so the fourth record should be in G_02. And with analog intelligence, we should see that record 5 belongs in its own subgroup.G_03

Is there a way to elegantly have dplyr(desirable, but not necessary) achieve this grouping G_01, G_02and G_03?

+4
source share
2 answers

You are looking for “connected components”. We can consider relationships as a graph, first rearranging the data ( melt), and then request a well-implemented graph library ( igraph) to do the work.

library(reshape2)
library(igraph)

components(graph.data.frame(melt(df,id="house")[,c(1,3)]))$membership[df$house]

  # H_01 H_02 H_03 H_04 H_05 
  #  1    1    1    2    3 

, "" , ,

with(melt(df,id="house"),data.frame(x=house,y=paste(variable,value,sep=".")))

.

+4

. Webb . , .

df = data.frame(apply(df, 2, as.character), stringsAsFactors = FALSE)
g = 1
df$group[1] = paste("G",g,sep = "")

#Find out unique groups and assign "CHECK" to rows in same groups
for (i in 2:nrow(df)){
    if (any(df[i,1:3] %in% unlist(df[1:(i-1),1:3]))){
        df$group[i] = "CHECK"
    } else {
        g = g + 1
        df$group[i] = paste("G",g,sep = "")
    }   
}

#Assign groups to rows in same group
for (i in 1:nrow(df)){
    if (df$group[i] != "CHECK"){
        next
    }
    if (df$house[i] %in% df$house[1:i]){
        df$group[i] = df$group[match(df$house[i], df$house[1:i])]        
    }
    if (df$loan[i] %in% df$loan[1:i]){
        df$group[i] = df$group[match(df$loan[i], df$loan[1:i])]        
    }
    if (df$borrower[i] %in% df$borrower[1:i]){
        df$group[i] = df$group[match(df$borrower[i], df$borrower[1:i])]       
    }
}

#> df$group
#[1] "G1" "G1" "G1" "G2" "G3"
+1

Source: https://habr.com/ru/post/1669621/


All Articles