I use several methods ( NumPy , Weave and Cython ) to run a Python performance test. Mathematically, mathematically, the code has the value C = AB , where A, B, and C are N x N matrices ( NOTE: this is a matrix product, not an element multiplication).
I wrote 5 different code implementations:
- Pure Python (Loop over 2D Python lists)
- NumPy (point product of 2D NumPy arrays)
- Weaving inline (C ++ loop over two-dimensional arrays)
- Cython (Loop over 2D Python lists + static typing)
- Cython-Numpy (Loop over 2D NumPy Arrays + Static Typing)
My expectation is that implementations 2 through 5 will be significantly faster than implementation 1. My results, however, point to something else. These are my normalized acceleration results for a relatively clean Python implementation:
- python_list: 1.00
- numpy_array: 330.09
- weave_inline: 30.72
- cython_list: 2.80
- cython_array: 0.14
I am very pleased with NumPy's performance, however I am less enthusiastic about Weave's performance and Cython's performance makes me cry. All my code is split into two files. Everything is automated, and you just need to run the first file to see all the results. Can someone help me by pointing out what I can do to get the best results?
matmul.py:
import time import numpy as np from scipy import weave from scipy.weave import converters import pyximport pyximport.install() import cython_matmul as cml def python_list_matmul(A, B): C = np.zeros(A.shape, dtype=float).tolist() A = A.tolist() B = B.tolist() for k in xrange(len(A)): for i in xrange(len(A)): for j in xrange(len(A)): C[i][k] += A[i][j] * B[j][k] return C def numpy_array_matmul(A, B): return np.dot(A, B) def weave_inline_matmul(A, B): code = """ int i, j, k; for (k = 0; k < N; ++k) { for (i = 0; i < N; ++i) { for (j = 0; j < N; ++j) { C(i, k) += A(i, j) * B(j, k); } } } """ C = np.zeros(A.shape, dtype=float) weave.inline(code, ['A', 'B', 'C', 'N'], type_converters=converters.blitz, compiler='gcc') return C N = 100 A = np.random.rand(N, N) B = np.random.rand(N, N) function = [] function.append([python_list_matmul, 'python_list']) function.append([numpy_array_matmul, 'numpy_array']) function.append([weave_inline_matmul, 'weave_inline']) function.append([cml.cython_list_matmul, 'cython_list']) function.append([cml.cython_array_matmul, 'cython_array']) t = [] for i in xrange(len(function)): t1 = time.time() C = function[i][0](A, B) t2 = time.time() t.append(t2 - t1) print function[i][1] + ' \t: ' + '{:10.6f}'.format(t[0] / t[-1])
cython_matmul.pyx:
import numpy as np cimport numpy as np import cython cimport cython DTYPE = np.float ctypedef np.float_t DTYPE_t @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) cpdef cython_list_matmul(A, B): cdef int i, j, k cdef int N = len(A) A = A.tolist() B = B.tolist() C = np.zeros([N, N]).tolist() for k in xrange(N): for i in xrange(N): for j in xrange(N): C[i][k] += A[i][j] * B[j][k] return C @cython.boundscheck(False) @cython.wraparound(False) @cython.nonecheck(False) cpdef cython_array_matmul(np.ndarray[DTYPE_t, ndim=2] A, np.ndarray[DTYPE_t, ndim=2] B): cdef int i, j, k, N = A.shape[0] cdef np.ndarray[DTYPE_t, ndim=2] C = np.zeros([N, N], dtype=DTYPE) for k in xrange(N): for i in xrange(N): for j in xrange(N): C[i][k] += A[i][j] * B[j][k] return C