According to nvidia . cublasZgemm is 6 times faster than Intel MKL.
However, on my PC (i7 2600, Nvidia gtx560, OS: linux 64bit) cublasZgemm is a bit slower than MKL.
I use numpy.dot (), which comes with an enthought python distribution that associates numpy with MKL 10.3.
The matrix multiplication function using cublasZgemm is compiled into a shared library and called using ctypes in a python script.
When multiplying two complex matrices, 1024x1024. numpy.dot () took 84 ms. The ctypes call function spent 110 ms, and part of cublasZgemm () took 97 ms.
I wonder why cublassZgemm is not as fast as nvidia stated?
source share