Calculate Mahalanobis distance using only NumPy

I am looking for a NumPy way of calculating the Mahalanobis distance between two numpy arrays (x and y). The following code can correctly calculate the same using the cdist Scipy function. Since this function calculates unnecessary matics in my case, I want to get a more direct way to calculate using only NumPy.

import numpy as np
from scipy.spatial.distance import cdist

x = np.array([[[1,2,3,4,5],
               [5,6,7,8,5],
               [5,6,7,8,5]],
              [[11,22,23,24,5],
               [25,26,27,28,5],
               [5,6,7,8,5]]])
i,j,k = x.shape

xx = x.reshape(i,j*k).T


y = np.array([[[31,32,33,34,5],
               [35,36,37,38,5],
               [5,6,7,8,5]],
              [[41,42,43,44,5],
               [45,46,47,48,5],
               [5,6,7,8,5]]])


yy = y.reshape(i,j*k).T

results =  cdist(xx,yy,'mahalanobis')
results = np.diag(results)
print results



[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

My test:

VI = np.linalg.inv(np.cov(xx,yy))

print np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T))

Can anyone fix this method?

Here is the formula for it:

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.mahalanobis.html#scipy.spatial.distance.mahalanobis

+4
source share
2 answers

I think your problem is building your covariance matrix. Try:

X = np.vstack([xx,yy])
V = np.cov(X.T)
VI = np.linalg.inv(V)
print np.diag(np.sqrt(np.dot(np.dot((xx-yy),VI),(xx-yy).T)))

Conclusion:

[ 2.28765854  2.75165028  2.75165028  2.75165028  0.          2.75165028
  2.75165028  2.75165028  2.75165028  0.          0.          0.          0.
  0.          0.        ]

, , , , C Python one:

A = np.dot((xx-yy),VI)
B = (xx-yy).T
n = A.shape[0]
D = np.empty(n)
for i in range(n):
    D[i] = np.sqrt(np.sum(A[i] * B[:,i]))

EDIT: , np.einsum voodoo Python ( 84,3 2,9 ):

D = np.sqrt(np.einsum('ij,ji->i', A, B))

: @Warren Weckesser, einsum A B:

delta = xx - yy
D = np.sqrt(np.einsum('nj,jk,nk->n', delta, VI, delta))
+9

, , einsum

e = xx-yy
X = np.vstack([xx,yy])
V = np.cov(X.T) 
p = np.linalg.inv(V)
D = np.sqrt(np.sum(np.dot(e,p) * e, axis = 1))
0

Source: https://habr.com/ru/post/1569277/


All Articles