Association rule in R - removing a redundant rule (arules)

Suppose we have 3 rules:

[1] {A,B,D} -> {C}

[2] {A,B} -> {C}

[3] Whatever it is

A rule [2]is a subset of the rule [1](because the rule [1]contains all the elements in the rule [2]), so the rule [1]must be removed (because the rule [1]is too specific and its information is included in the rule [2])

I searched over the internet and everyone uses this code to remove redundant rules:

subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
redundant <- colSums(subset.matrix, na.rm=T) >= 1
which(redundant)
rules.pruned <- rules.sorted[!redundant]

I do not understand how the code works.

After line 2 of the code, a subset of the matrix will become:

      [,1] [,2] [,3]
[1,]   NA    1    0
[2,]   NA   NA    0
[3,]   NA   NA   NA

The cells of the lower triangle are set to NA, and since the rule [2]is a subset of the rule [1], the corresponding cell has a value of 1. Thus, I have 2 questions:

  • NA? , , [2] [3] ? ( ​​ NA)

  • [1] , [2] [1]. ( 2 1, 3 2 >= 1, )

!

+4
2

, , ( ), rules.sorted , . , , is.subset() n ^ 2, n - . , is.subset rhs lhs , . .

is.redundant() arules ( 1.4-2). :

, . , , . , RHS, LHS. X → Y ,

X ' X, conf (X' → Y) >= conf (X → Y).

, Bayardo et al. (2000). , , . , .

? is.redundant.

+7

arules...

apriori:

rules <- apriori(transDat, parameter = list(supp = 0.01, conf = 0.5, target = "rules", maxlen = 3))

:

rules <- rules[!is.redundant(rules)]

:

arules::inspect(rules)

:

df = data.frame(
lhs = labels(lhs(rules)),
rhs = labels(rhs(rules)), 
rules@quality)
+1

Source: https://habr.com/ru/post/1650353/


All Articles