Efficient way to create a market basket matrix in R

I am trying to create a market basket matrix from data that looks like this:

input <- matrix( c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003,100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008), ncol=2) 

This means the following data:

 colnames(input) <- c( "Customer" , "Product" ) 

From here a matrix is ​​created with the client as a row and all products in the form of columns. This can be achieved by creating this matrix with zero:

 input <- as.data.frame(input) m <- matrix(0, length(unique(input$Customer)), length(unique(input$Product))) rownames(m) <- unique(input$Customer) colnames(m) <- unique(input$Product) 

It's fast enough (they have data from 750,000+ rows, creating a 15,000 by 1,500 matrix), but now I want to fill in the matrix where necessary:

 for( i in 1:nrow(input) ) { m[ as.character(input[i,1]),as.character(input[i,2])] <- 1 } 

I think there should be a more efficient way to do this, as I learned from stackoverflow that for loops can often be avoided. So the question is, is there a faster way?

And I need the data in the matrix, because I would like to use packages like carriages. And after that I will probably encounter the same problem as the R management management advice (carriage, model matrices, data frames) here , but this is worrying later.

+4
source share
3 answers

You do not need reshape2 ; table is what you are looking for.

 m1 <- as.matrix(as.data.frame.matrix(table(input))) all.equal(m, m1) TRUE 
+3
source

The reshape2 package has a casting function that will complete the task:

 require(reshape2) m <- acast(input, Customer ~ Product,function(x) 1,fill=0) m 

gives me

  100001 100002 100003 100004 100005 100006 100007 100008 1000001 1 1 1 1 1 1 0 0 1000002 0 1 1 0 0 0 1 0 1000003 0 1 1 0 0 0 0 1 

Hope this is what you were looking for?

+2
source

You can use a sparse matrix:

 library(Matrix) input <- as.data.frame(apply(input,2,as.character)) m <- sparseMatrix( i = as.numeric( input[,1] ), j = as.numeric( input[,2] ), x = 1, dim = c( length(levels(input[,1])), length(levels(input[,2])) ), dimnames = list( levels(input[,1]), levels(input[,2]) ) ) m # 3 x 8 sparse Matrix of class "dgCMatrix" # 100001 100002 100003 100004 100005 100006 100007 100008 # 1000001 1 1 1 1 1 1 . . # 1000002 . 1 1 . . . 1 . # 1000003 . 1 1 . . . . 1 
+1
source

Source: https://habr.com/ru/post/1492076/


All Articles