Distance Mahalanobis in R

I found the mahalanobis.dist function in the StatMatch package ( http://cran.r-project.org/web/packages/StatMatch/StatMatch.pdf ), but that does not do exactly what I want. It seems that calculating the distance of the Mahalanobis from each observation in the data. To each observation in data.x

I would like to calculate the mahalanobis distance of one observation in the data. to all observations in data.x. Basically calculate the distance of the mahalanobis of one point to the "cloud" of points, if that makes sense. The type of getting the idea of ​​the probability of observation, which is part of another group of observations.

This person ( http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html ) seems to be doing this, and I tried to reproduce his process in R, but it doesn't work when I get to the bottom of the equation :

mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix) 

All the code I'm working with is here:

 a = rbind(c(2,2), c(2,5), c(6,5),c(7,3)) colnames(a) = c('x', 'y') b = rbind(c(6,5),c(3,4)) colnames(b) = c('x', 'y') acov = cov(a) bcov = cov(b) meandiff1 = mean(a[,1]) - mean(b[,1]) meandiff2 = mean(a[,2]) - mean(b[,2]) meandiffmatrix = rbind(c(meandiff1,meandiff2)) totaldata = dim(a)[1] + dim(b)[1] pooledcov = (dim(a)[1]/totaldata * acov) + (dim(b)[1]/totaldata * bcov) inversepooledcov = solve(pooledcov) mahaldist = sqrt((inversepooledcov %*% t(meandiffmatrix)) %*% meandiffmatrix) 
+6
source share
5 answers

How to use mahalanobis function in stats package:

  mahalanobis(x, center, cov, inverted = FALSE, ...) 
+5
source

I tried this from the same website you were looking at, and then came across this question. I managed to get the script to work, but I get a different result.

 #WORKING EXAMPLE #MAHALANOBIS DIST OF TWO MATRICES #define matrix mat1<-matrix(data=c(2,2,6,7,4,6,5,4,2,1,2,5,5,3,7,4,3,6,5,3),nrow=10) mat2<-matrix(data=c(6,7,8,5,5,5,4,7,6,4),nrow=5) #center data mat1.1<-scale(mat1,center=T,scale=F) mat2.1<-scale(mat2,center=T,scale=F) #cov matrix mat1.2<-cov(mat1.1,method="pearson") mat2.2<-cov(mat2.1,method="pearson") n1<-nrow(mat1) n2<-nrow(mat2) n3<-n1+n2 #pooled matrix mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2) #inverse pooled matrix mat4<-solve(mat3) #mean diff mat5<-as.matrix((colMeans(mat1)-colMeans(mat2))) #multiply mat6<-t(mat5) %*% mat4 #multiply sqrt(mat6 %*% mat5) 

I think that the function mahalanobis() used to calculate the distances of the mahalanobis between individuals (rows) in the same matrix. The pairwise.mahalanobis() function from package(HDMD) can compare two or more matrices and give mahalanobis the distance between the matrices.

+5
source

Your way out before taking the square root:

 inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix [,1] [,2] x -0.004349227 -0.01304768 y 0.114529639 0.34358892 

I think you can get the square root of the number of negative numbers, so you have a NAN for negative elements:

  sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix) [,1] [,2] x NaN NaN y 0.3384223 0.5861646 Warning message: In sqrt(inversepooledcov %*% t(meandiffmatrix) %*% meandiffmatrix) : NaNs produced 
+1
source

You can wrap the stats::mahalanobis below to display the mahalanobis distance matrix (mahalanobis pairwise distances):

 # x - data frame # cx - covariance matrix; if not provided, # it will be estimated from the data mah <- function(x, cx = NULL) { if(is.null(cx)) cx <- cov(x) out <- lapply(1:nrow(x), function(i) { mahalanobis(x = x, center = do.call("c", x[i, ]), cov = cx) }) return(as.dist(do.call("rbind", out))) } 

Then you can group your data and build it, for example:

 # Dummy data x <- data.frame(X = c(rnorm(10, 0), rnorm(10, 5)), Y = c(rnorm(10, 0), rnorm(10, 7)), Z = c(rnorm(10, 0), rnorm(10, 12))) rownames(x) <- LETTERS[1:20] plot(x, pch = LETTERS[1:20]) 

enter image description here

 # Comute the mahalanobis distance matrix d <- mah(x) d # Cluster and plot hc <- hclust(d) plot(hc) 

enter image description here

+1
source

There's a very easy way to do this with the R Package "biotools". In this case, you get the Mahalanobis matrix with the square of the distance.

 #Manly (2004, p.65-66) x1 <- c(131.37, 132.37, 134.47, 135.50, 136.17) x2 <- c(133.60, 132.70, 133.80, 132.30, 130.33) x3 <- c(99.17, 99.07, 96.03, 94.53, 93.50) x4 <- c(50.53, 50.23, 50.57, 51.97, 51.37) #size (nxp) #Means x <- cbind(x1, x2, x3, x4) #size (pxp) #Variances and Covariances Cov <- matrix(c(21.112,0.038,0.078,2.01, 0.038,23.486,5.2,2.844, 0.078,5.2,24.18,1.134, 2.01,2.844,1.134,10.154), 4, 4) library(biotools) Mahalanobis_Distance<-D2.dist(x, Cov) print(Mahalanobis_Distance) 
0
source

Source: https://habr.com/ru/post/953322/


All Articles