I am trying to find a way to create sequential Group_IDs based on "overlapping" variables. The easiest way to describe this is to use the example of a home, loan, and borrower.
Suppose we have the following example
df <- data.frame(house = c('H_01','H_02','H_03','H_04','H_05'),
loan = c('L_01','L_02','L_02','L_03','L_04'),
borrower = c('B_01','B_01','B_02','B_03','B_04'))
Suppose everyone would have a lot of relationships between all the variables (house, loan, borrower). So, for example, House 1 ( H_01) is associated with Loan 1 ( L_01) and Borrower 1 ( B_01). But B_01it is also related to L_02, which itself is connected to H_02, but also related to H_03, so the first 3 rows in my table should be marked G_01(for group 1).
H_04is associated with L_03which does not belong to any other record, and none B_03relates to any other record, so the fourth record should be in G_02. And with analog intelligence, we should see that record 5 belongs in its own subgroup.G_03
Is there a way to elegantly have dplyr(desirable, but not necessary) achieve this grouping G_01, G_02and G_03?