Suppose we have 3 rules:
[1] {A,B,D} -> {C}
[2] {A,B} -> {C}
[3] Whatever it is
A rule [2]is a subset of the rule [1](because the rule [1]contains all the elements in the rule [2]), so the rule [1]must be removed (because the rule [1]is too specific and its information is included in the rule [2])
I searched over the internet and everyone uses this code to remove redundant rules:
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
rules.pruned <- rules.sorted[!redundant]
I do not understand how the code works.
After line 2 of the code, a subset of the matrix will become:
[,1] [,2] [,3]
[1,] NA 1 0
[2,] NA NA 0
[3,] NA NA NA
The cells of the lower triangle are set to NA, and since the rule [2]is a subset of the rule [1], the corresponding cell has a value of 1. Thus, I have 2 questions:
NA? , , [2] [3] ? ( NA)
[1] , [2] [1]. ( 2 1, 3 2 >= 1, )
!