Efficient way to create a market basket matrix in R

Question

Efficient way to create a market basket matrix in R

I am trying to create a market basket matrix from data that looks like this:

input <- matrix( c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003,100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008), ncol=2)

This means the following data:

 colnames(input) <- c( "Customer" , "Product" )

From here a matrix is created with the client as a row and all products in the form of columns. This can be achieved by creating this matrix with zero:

 input <- as.data.frame(input) m <- matrix(0, length(unique(input$Customer)), length(unique(input$Product))) rownames(m) <- unique(input$Customer) colnames(m) <- unique(input$Product)

It's fast enough (they have data from 750,000+ rows, creating a 15,000 by 1,500 matrix), but now I want to fill in the matrix where necessary:

 for( i in 1:nrow(input) ) { m[ as.character(input[i,1]),as.character(input[i,2])] <- 1 }

I think there should be a more efficient way to do this, as I learned from stackoverflow that for loops can often be avoided. So the question is, is there a faster way?

And I need the data in the matrix, because I would like to use packages like carriages. And after that I will probably encounter the same problem as the R management management advice (carriage, model matrices, data frames) here , but this is worrying later.

+4

matrix r

Freddy Jul 18 '13 at 12:16

source share

3 answers

The reshape2 package has a casting function that will complete the task:

 require(reshape2) m <- acast(input, Customer ~ Product,function(x) 1,fill=0) m

gives me

  100001 100002 100003 100004 100005 100006 100007 100008 1000001 1 1 1 1 1 1 0 0 1000002 0 1 1 0 0 0 1 0 1000003 0 1 1 0 0 0 0 1

Hope this is what you were looking for?

+2

ninjasnowman Jul 18 '13 at 12:39

source share

You can use a sparse matrix:

 library(Matrix) input <- as.data.frame(apply(input,2,as.character)) m <- sparseMatrix( i = as.numeric( input[,1] ), j = as.numeric( input[,2] ), x = 1, dim = c( length(levels(input[,1])), length(levels(input[,2])) ), dimnames = list( levels(input[,1]), levels(input[,2]) ) ) m # 3 x 8 sparse Matrix of class "dgCMatrix" # 100001 100002 100003 100004 100005 100006 100007 100008 # 1000001 1 1 1 1 1 1 . . # 1000002 . 1 1 . . . 1 . # 1000003 . 1 1 . . . . 1

+1

Vincent zoonekynd Jul 18 '13 at 12:46

source share

shadow · Accepted Answer · 2013-07-18T12:48:51+0000

You do not need reshape2 ; table is what you are looking for.

 m1 <- as.matrix(as.data.frame.matrix(table(input))) all.equal(m, m1) TRUE

Efficient way to create a market basket matrix in R

More articles: