I built numpy 1.6.2 and scipy with MKL, hoping to get better performance. Currently, I have code that relies mainly on np.einsum (), and I was told that einsum is not very good with MKL, because there is almost no vectorization. equals sign. So I was thinking of writing some of my codes using np.dot () and slicing, just to get some multi-core speed. I really like the simplicity of np.einsum (), and the readability is good. In any case, for example, I have multidimensional matrix multiplication of the form:
np.einsum('mi,mnijqk->njqk',A,B)
So, how do I convert something like that or other 3.4 and 5 dimensional array multiplications to np.dot () effective MKL operations?
I will add additional information: I am calculating this equation:

For this, I use the code:
np.einsum('mn,mni,nij,nik,mi->njk',a,np.exp(b[:,:,np.newaxis]*U[np.newaxis,:,:]),P,P,X)
, , cython, 5 :
#STACKOVERFLOW QUESTION:
from __future__ import division
import numpy as np
cimport numpy as np
cimport cython
cdef extern from "math.h":
double exp(double x)
DTYPE = np.float
ctypedef np.float_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
def cython_DX_h(np.ndarray[DTYPE_t, ndim=3] P, np.ndarray[DTYPE_t, ndim=1] a, np.ndarray[DTYPE_t, ndim=1] b, np.ndarray[DTYPE_t, ndim=2] U, np.ndarray[DTYPE_t, ndim=2] X, int I, int M):
assert P.dtype == DTYPE and a.dtype == DTYPE and b.dtype == DTYPE and U.dtype == DTYPE and X.dtype == DTYPE
cdef np.ndarray[DTYPE_t,ndim=3] DX_h=np.zeros((N,I,I),dtype=DTYPE)
cdef unsigned int j,n,k,m,i
for n in range(N):
for j in range(I):
for k in range(I):
aux=0
for m in range(N):
for i in range(I):
aux+=a[m,n]*exp(b[m,n]*U[n,i])*P[n,i,j]*P[n,i,k]*X[m,i]
DX_h[n,j,k]=aux
return DX_h
cython? ( , )
prange cython, gil nogil.