I'm not sure how big you are arrays, but the following is equivalent:
R = np.einsum('ij,kj',A,A)
And it can be quite a lot faster and significantly less than memory:
In [7]: A = np.random.random(size=(500,400)) In [8]: %timeit R = (A[:,np.newaxis,:] * A[np.newaxis,:,:]).sum(2) 1 loops, best of 3: 1.21 s per loop In [9]: %timeit R = np.einsum('ij,kj',A,A) 10 loops, best of 3: 54 ms per loop
If I increase the size of A to (500,4000) , np.einsum runs the calculation in about 2 seconds, while the original wording shreds my machine to a halt due to the size of the temporary array that it should create.
Update
As @Jaime noted in the comments, np.dot(A,AT) also an equivalent formulation of the problem and can even be faster than the solution np.einsum . Full credit to him to indicate this, but if he does not publish it as an official decision, I would like to pull it into the main answer.
source share