R: compare all columns in pairs in a matrix

I have a matrix with 41 rows and 6 columns. Here is the first part.

X13 X15 X17 X19 X21 X23 [1,] "7" "6" "5" "8" "1" "8" [2,] "7" "6" "5" "8" "14" "3" [3,] "7" "6" "1" "3" "12" "3" [4,] "7" "6" "1" "5" "6" "14" [5,] "2" "6" "1" "5" "16" "3" [6,] "2" "3" "5" "5" "2" "3" [7,] "7" "5" "5" "17" "7" "3" [8,] "7" "2" "5" "2" "2" "14" [9,] "2" "2" "10" "10" "2" "3" [10,] "2" "2" "10" "5" "2" "6" 

My goal is to compare all the columns with each other and see how many numbers are the same in the two columns. I tried to do it like this:

 s <- sum(matrix[,1]==matrix[,2]) 

But since I need to compare all possible pairs, this is not effective. It would be nice to put this in a loop, but I have no idea how to do this.

And I would like to get my result as a 6x6 similarity matrix. Something like that:

  X13 X15 X17 X19 X21 X23 X13 0 0 3 2 2 3 X15 0 0 9 11 4 6 X17 3 9 0 5 1 3 X19 2 11 5 0 9 10 X21 2 4 1 9 0 9 X23 3 6 3 10 9 0 

As you can see, I would like to put zeros in the matrix when the column is compared with iteslf.

Since I am new to R, this semms task is really difficult for me. I need to use this comparison for 50 matrices, so I would be glad if you could help me. I would appreciate any advice / suggestions. My English is also not very good, but I hope I can explain my problem quite well. :)

+6
source share
3 answers

A non-vectorized, but perhaps more memory efficient way:

 # Fancy way. similarity.matrix<-apply(matrix,2,function(x)colSums(x==matrix)) diag(similarity.matrix)<-0 # More understandable. But verbose. similarity.matrix<-matrix(nrow=ncol(matrix),ncol=ncol(matrix)) for(col in 1:ncol(matrix)){ matches<-matrix[,col]==matrix match.counts<-colSums(matches) match.counts[col]<-0 # Set the same column comparison to zero. similarity.matrix[,col]<-match.counts } 
+4
source

Here is a fully vectorial solution using expand.grid to compute indices and colSums and matrix to complete the result.

 # Some reproducible 6x6 sample data set.seed(1) m <- matrix( sample(10,36,repl=TRUE) , ncol = 6 ) # [,1] [,2] [,3] [,4] [,5] [,6] #[1,] 3 10 7 4 3 5 #[2,] 4 7 4 8 4 6 #[3,] 6 7 8 10 1 5 #[4,] 10 1 5 3 4 2 #[5,] 3 3 8 7 9 9 #[6,] 9 2 10 2 4 7 # Vector source for column combinations n <- seq_len( ncol(m) ) # Make combinations id <- expand.grid( n , n ) # Get result out <- matrix( colSums( m[ , id[,1] ] == m[ , id[,2] ] ) , ncol = length(n) ) diag(out) <- 0 # [,1] [,2] [,3] [,4] [,5] [,6] #[1,] 0 1 1 0 2 0 #[2,] 1 0 0 1 0 0 #[3,] 1 0 0 0 1 0 #[4,] 0 1 0 0 0 0 #[5,] 2 0 1 0 0 1 #[6,] 0 0 0 0 1 0 
+8
source

The approach using v_outer from the qdap package:

 library(qdapTools) #Using Simon data x <- v_outer(m, function(x, y) sum(x==y)) diag(x) <- 0 ## V1 V2 V3 V4 V5 V6 ## V1 0 1 1 0 2 0 ## V2 1 0 0 1 0 0 ## V3 1 0 0 0 1 0 ## V4 0 1 0 0 0 0 ## V5 2 0 1 0 0 1 ## V6 0 0 0 0 1 0 

EDIT I added tests:

 set.seed(1) matrix <- m <- matrix( sample(10,36,repl=TRUE) , ncol = 6 ) MATRIX <- function(){ n <- seq_len( ncol(m) ) id <- expand.grid( n , n ) out <- matrix( colSums( m[ , id[,1] ] == m[ , id[,2] ] ) , ncol = length(n) ) diag(out) <- 0 out } V_OUTER <- function(){ x <- v_outer(m, function(x, y) sum(x==y)) diag(x) <- 0 x } APPLY <- function(){ similarity.matrix<-apply(matrix,2,function(x)colSums(x==matrix)) diag(similarity.matrix)<-0 similarity.matrix } library(microbenchmark) (op <- microbenchmark( MATRIX(), V_OUTER(), APPLY() , times=1000L)) Unit: microseconds expr min lq median uq max neval MATRIX() 243.980 264.972 277.101 286.898 1719.519 1000 V_OUTER() 203.861 223.921 234.650 243.280 1579.570 1000 APPLY() 96.566 108.228 112.893 118.025 1470.409 1000 
+1
source

Source: https://habr.com/ru/post/957932/


All Articles