The arules package in R uses class transactions. Therefore, to use the apriori() function, I need to convert the existing data. I have a matrix with 2 columns and approximately 1.6 mm rows and tried to convert the data as follows:
transaction_data <- as(split(original_data[,"id"], original_data[,"type"]), "transactions")
where original_data is my data matrix. Due to the amount of data, I used the largest Amazon AWS machine with 64GB of RAM. After a while I get
the vector exceeds the length limit of the vector in 'AnswerType'
Memory usage in the machine is still "only" at 60%. Is this an R-based constraint? Is there any way around this other than using sampling? Using only 1/4 of the data, the conversion worked fine.
Edit: As indicated, one of the variables was a factor instead of a symbol. After the change, the conversion was processed quickly and correctly.
Marco source share