Let's play with a few small arrays
In [110]: a=np.arange(2*3*4).reshape(2,3,4) In [111]: b=np.arange(2*3).reshape(2,3) In [112]: np.einsum('ijx,ij->ix',a,b) Out[112]: array([[ 20, 23, 26, 29], [200, 212, 224, 236]]) In [113]: np.diagonal(np.dot(b,a)).T Out[113]: array([[ 20, 23, 26, 29], [200, 212, 224, 236]])
np.dot acts on the last dull 1st array, and the second on the last of the 2nd. So I have to switch the arguments so that the rows of dimension 3 displayed up. dot(b,a) creates an array (2,2,4). diagonal selects 2 of these βstringsβ and transposes for cleaning. Another einsum expresses that cleaning is beautiful:
In [122]: np.einsum('iik->ik',np.dot(b,a))
Since np.dot creates a larger array than the original einsum , it is unlikely to be faster even if the underlying C code is more rigid.
(It is curious that I had a problem with replicating np.dot(b,a) using einsum , it will not generate an array (2,2, ...)).
For case a,a we should do something similar - collapse the axes of one array so that the last size is associated with the second to the last of the other, make dot , and then clear with diagonal and transpose :
In [157]: np.einsum('ijx,ijy->ixy',a,a).shape Out[157]: (2, 4, 4) In [158]: np.einsum('ijjx->jix',np.dot(np.rollaxis(a,2),a)) In [176]: np.diagonal(np.dot(np.rollaxis(a,2),a),0,2).T
tensordot is another way to take a dot along the selected axes.
np.tensordot(a,a,(1,1)) np.diagonal(np.rollaxis(np.tensordot(a,a,(1,1)),1),0,2).T # with cleanup