Calculation of the cost of lifting

Question

Calculation of the cost of lifting

I have a (symmetric) adjacency matrix that was created based on the joint presence of names (e.g. Greg, Mary, Sam, Tom) in newspaper articles (e.g. a, b, c, d). See below.

How to calculate the elevator value for nonzero matrix elements ( http://en.wikipedia.org/wiki/Lift_(data_mining) )?

I would be interested in an efficient implementation that can also be used for very large matrices (for example, millions of non-zero elements).

I appreciate any help.

# Load package library(Matrix) # Data A <- new("dgTMatrix" , i = c(2L, 2L, 2L, 0L, 3L, 3L, 3L, 1L, 1L) , j = c(0L, 1L, 2L, 0L, 1L, 2L, 3L, 1L, 3L) , Dim = c(4L, 4L) , Dimnames = list(c("Greg", "Mary", "Sam", "Tom"), c("a", "b", "c", "d")) , x = c(1, 1, 1, 1, 1, 1, 1, 1, 1) , factors = list() ) # > A # 4 x 4 sparse Matrix of class "dgTMatrix" # abcd # Greg 1 . . . # Mary . 1 . 1 # Sam 1 1 1 . # Tom . 1 1 1 # One mode projection of the data # (ie final adjacency matrix, which is the basis for the lift value calculation) A.final <- tcrossprod(A) # > A.final # 4 x 4 sparse Matrix of class "dsCMatrix" # Greg Mary Sam Tom # Greg 1 . 1 . # Mary . 2 1 2 # Sam 1 1 3 2 # Tom . 2 2 3

+5

matrix r data-mining

majom 30 sept '14 at 21:34

source share

1 answer

aymer · Answer 1 · 2014-10-15T13:15:10+0000

Here is something that can help you, but certainly not the most effective implementation.

 ComputeLift <- function(data, projection){ # Initialize a matrix to store the results. lift <- matrix(NA, nrow=nrow(projection), ncol=ncol(projection)) # Select all pairs in the projection matrix for(i in 1:nrow(projection)){ for(j in 1:ncol(projection)){ # The probability to observe both names in the same article is the # number of articles where the names appear together divided by the # total number of articles pAB <- projection[i,j]/ncol(data) # The probability for a name to appear in an article is the number of # articles where the name appears divided by the total number of articles pA <- sum(A[i,])/ncol(data) pB <- sum(A[j,])/ncol(data) # The lift is computed as the probability to observe both names in an # article divided by the product of the probabilities to observe each name. lift[i,j] <- pAB/(pA*pB) } } lift } ComputeLift(data=A, projection=A.final)

Calculation of the cost of lifting

More articles: