How can I get the frequencies of common item sets from an a priori call in R?

Problem:

apriori function arules package introduces association rules from input transactions and reports on the support, confidence and elevator of each rule. Association rules are based on frequent sets. I would like to get the most commonly used items in input transactions. In particular, I would like to receive all the items with minimal support. Item set support is the ratio of the number of transactions containing a set of elements to the total number of transactions.

Requirements:

  • I would prefer to find the most commonly used elements from the intermediate results of the apriori function. That is, I would prefer not to write the program from scratch just to calculate the most commonly used sets of elements, because the apriori function already computes it as an intermediate step. However, if there really is no reasonable way to access the intermediate results of the apriori function, I am open to other solutions.
  • I would prefer not to perform string manipulations as a result of the apriori function, because this approach will be too dependent on the string representation of the result of the apriori function. Again, if it turns out that there are no better alternatives, I can resort to this approach.
  • I know the itemFrequency function provided by the arules package. Unfortunately, this function simply tells item sets one element. I am interested in all elements of any length with minimal support.
  • I would like the result to be sorted by support numerically, and then by sets of elements lexicographically.

Input Example:

 a,b a,b,c 

Program:

 # The following is how I'm using apriori to infer the association rules. library(package = "arules") transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",") rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001)) WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA) 

Current output:

 "","rules","support","confidence","lift" "1","{} => {c}",0.5,0.5,1 "2","{} => {b}",1,1,1 "3","{} => {a}",1,1,1 "4","{c} => {b}",0.5,1,1 "5","{b} => {c}",0.5,0.5,1 "6","{c} => {a}",0.5,1,1 "7","{a} => {c}",0.5,0.5,1 "8","{b} => {a}",1,1,1 "9","{a} => {b}",1,1,1 "10","{b,c} => {a}",0.5,1,1 "11","{a,c} => {b}",0.5,1,1 "12","{a,b} => {c}",0.5,0.5,1 

Desired Result:

 "itemset","support" "{a}",1 "{a,b}",1 "{b}",1 "{a,b,c}",0.5 "{a,c}",0.5 "{b,c}",0.5 "{c}",0.5 
+4
source share
1 answer

I found the generatingItemsets function in the arules package reference guide .

 library(package = "arules") transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",") rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001)) itemsets <- unique(generatingItemsets(rules)) itemsets.df <- as(itemsets, "data.frame") frequentItemsets <- itemsets.df[with(itemsets.df, order(-support,items)),] names(frequentItemsets)[1] <- "itemset" write.table(frequentItemsets, file = "", sep = ",", row.names = FALSE) 
+7
source

Source: https://habr.com/ru/post/1390922/


All Articles