I am trying to create a market basket matrix from data that looks like this:
input <- matrix( c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003,100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008), ncol=2)
This means the following data:
colnames(input) <- c( "Customer" , "Product" )
From here a matrix is ββcreated with the client as a row and all products in the form of columns. This can be achieved by creating this matrix with zero:
input <- as.data.frame(input) m <- matrix(0, length(unique(input$Customer)), length(unique(input$Product))) rownames(m) <- unique(input$Customer) colnames(m) <- unique(input$Product)
It's fast enough (they have data from 750,000+ rows, creating a 15,000 by 1,500 matrix), but now I want to fill in the matrix where necessary:
for( i in 1:nrow(input) ) { m[ as.character(input[i,1]),as.character(input[i,2])] <- 1 }
I think there should be a more efficient way to do this, as I learned from stackoverflow that for loops can often be avoided. So the question is, is there a faster way?
And I need the data in the matrix, because I would like to use packages like carriages. And after that I will probably encounter the same problem as the R management management advice (carriage, model matrices, data frames) here , but this is worrying later.