pdist 's pdist is a collection of different functions - there is no Anan equivalent for all of them at once. However, each specific distance, which is a mathematical expression of a closed form, can be written into Theano as such, and then compiled.
Take, for example, the Minkowski mink distance p (copy + paste):
import theano import theano.tensor as T X = T.fmatrix('X') Y = T.fmatrix('Y') P = T.scalar('P') translation_vectors = X.reshape((X.shape[0], 1, -1)) - Y.reshape((1, Y.shape[0], -1)) minkowski_distances = (abs(translation_vectors) ** P).sum(2) ** (1. / P) f_minkowski = theano.function([X, Y, P], minkowski_distances)
Note that abs calls the built-in __abs__ , so abs also an anano function. Now we can compare this with pdist :
import numpy as np from scipy.spatial.distance import pdist rng = np.random.RandomState(42) d = 20
This gives
Testing p=1.00, discrepancy 1.322e-06 Testing p=3.00, discrepancy 4.277e-07 Testing p=2.00, discrepancy 4.789e-07
As you can see, there is a match, but the f_minkowski function f_minkowski somewhat more general, as it compares the strings of two, possibly different arrays. If the same array is passed twice as input, f_minkowski returns a matrix, while pdist returns a list without redundancy. If this behavior is desired, it can also be implemented completely dynamically, but I will adhere to the general case here.
One possibility of specialization should be noted: in the case p=2 calculations become simpler using a binomial formula, and this can be used to save precious memory space: if the total Minkowski distance, as implemented above, creates a 3D array (due to avoiding for- loops and summing cumulatively), which is prohibitive, depending on the size of d (and nX, nY ), for p=2 we can write
squared_euclidean_distances = (X ** 2).sum(1).reshape((X.shape[0], 1)) + (Y ** 2).sum(1).reshape((1, Y.shape[0])) - 2 * X.dot(YT) f_euclidean = theano.function([X, Y], T.sqrt(squared_euclidean_distances))
which uses O(nX * nY) space instead of O(nX * nY * d) We check for compliance, this time on a common problem:
d_eucl = f_euclidean(x, y) d_minkowski2 = f_minkowski(x, y, 2.) print "Comparing f_minkowski, p=2 and f_euclidean: l2-discrepancy %1.3e" % ((d_eucl - d_minkowski2) ** 2).sum()
getting
Comparing f_minkowski, p=2 and f_euclidean: l2-discrepancy 1.464e-11