Optimizing NumPy with Cython

Question

Optimizing NumPy with Cython

I'm currently trying to optimize code written in pure Python. This code uses NumPy very much since I work with NumPy arrays. Below you can see the simplest of my classes, which I have converted to Cython . Which only multiplies two Numpy arrays. Here:

bendingForces = self.matrixPrefactor * membraneHeight

My question is how and how can I optimize this, because when I look at the C code that generates "cython -a", there are many NumPy calls that don't look very efficient.

 import numpy as np cimport numpy as np ctypedef np.float64_t dtype_t ctypedef np.complex128_t cplxtype_t ctypedef Py_ssize_t index_t cdef class bendingForcesClass( object ): cdef dtype_t bendingRigidity cdef np.ndarray matrixPrefactor cdef np.ndarray bendingForces def __init__( self, dtype_t bendingRigidity, np.ndarray[dtype_t, ndim=2] waveNumbersNorm ): self.bendingRigidity = bendingRigidity self.matrixPrefactor = -self.bendingRigidity * waveNumbersNorm**2 cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) : cdef np.ndarray bendingForces bendingForces = self.matrixPrefactor * membraneHeight return bendingForces

The idea I had was to use two for loops and iterate over the elements of arrays. Perhaps I could use a compiler to optimize this with SIMD operations ?! I tried what I could compile, but it gave strange results and left forever. Here is the replacement function code:

 cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) : cdef index_t index1, index2 # corresponds to: cdef Py_ssize_t index1, index2 for index1 in range( self.matrixSize ): for index2 in range( self.matrixSize ): self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ] return self.bendingForces

This code, as I said, is very slow and does not work properly. So what am I doing wrong? What would be the best way to optimize this and remove NumPy calls?

+4

optimization python numpy matrix-multiplication cython

packoman Mar 16 '11 at 20:21

source share

2 answers

You can probably speed it up using

 for index1 from 0 <= index1 < max1:

instead of using a range I'm not sure what is typed.

Have you checked this option and this one ?

0

Joost rekveld Mar 20 '11 at 0:12

source share

highBandWidth · Accepted Answer · 2011-03-17T12:56:33+0000

For simple matrix multiplications, NumPy code already only executes a loop and multiplies initially, so it would be hard to beat this in Cython. Cython is great for situations where you are replacing loops in Python with those in Cython. One of the reasons your code is slower than NumPy is because every time you look at the index in your array,

 self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ]

it does more calculations, such as checking bounds (index is valid). If you pointed indexes to unsigned ints, you can use the @cython.boundscheck(False) decorator before the function.

For more information on speeding up Cython code, see the tutorial .

Optimizing NumPy with Cython

More articles: