Really reasonable approaches that you got in another answer. I would like to add a
based approach to the mix, which also uses some reshaping
. Here is one way to do this -
k,m,n = A.shape
Runtime Tests -
, other answer
, .
In [130]: k = 100
...: m = 50
...: n = 10
...: A = np.arange(k*m*n).reshape(k, m, n)
...: B = np.arange(m*m).reshape(m, m)
In [131]: %timeit np.einsum('nij,njk->ik', np.einsum('nij,jk->nik', A.transpose(0,2,1), B), A)
100 loops, best of 3: 10.7 ms per loop
In [132]: %timeit np.einsum('nij, il, kln ->jk', A, B, A.T)
10 loops, best of 3: 105 ms per loop
In [133]: %timeit np.tensordot(A, np.tensordot(A, B, axes=(1, 1)), axes=((0, 1), (0, 2)))
100 loops, best of 3: 6.22 ms per loop
In [134]: %timeit ((A.transpose(2,0,1).reshape(-1,m).dot(B.T)).reshape(n,-1)).dot(A.reshape(-1,n)).T
100 loops, best of 3: 5.3 ms per loop