I'm currently trying to optimize code written in pure Python. This code uses NumPy very much since I work with NumPy arrays. Below you can see the simplest of my classes, which I have converted to Cython . Which only multiplies two Numpy arrays. Here:
bendingForces = self.matrixPrefactor * membraneHeight
My question is how and how can I optimize this, because when I look at the C code that generates "cython -a", there are many NumPy calls that don't look very efficient.
import numpy as np cimport numpy as np ctypedef np.float64_t dtype_t ctypedef np.complex128_t cplxtype_t ctypedef Py_ssize_t index_t cdef class bendingForcesClass( object ): cdef dtype_t bendingRigidity cdef np.ndarray matrixPrefactor cdef np.ndarray bendingForces def __init__( self, dtype_t bendingRigidity, np.ndarray[dtype_t, ndim=2] waveNumbersNorm ): self.bendingRigidity = bendingRigidity self.matrixPrefactor = -self.bendingRigidity * waveNumbersNorm**2 cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) : cdef np.ndarray bendingForces bendingForces = self.matrixPrefactor * membraneHeight return bendingForces
The idea I had was to use two for loops and iterate over the elements of arrays. Perhaps I could use a compiler to optimize this with SIMD operations ?! I tried what I could compile, but it gave strange results and left forever. Here is the replacement function code:
cpdef np.ndarray calculate( self, np.ndarray membraneHeight ) : cdef index_t index1, index2 # corresponds to: cdef Py_ssize_t index1, index2 for index1 in range( self.matrixSize ): for index2 in range( self.matrixSize ): self.bendingForces[ index1, index2 ] = self.matrixPrefactor.data[ index1, index2 ] * membraneHeight.data[ index1, index2 ] return self.bendingForces
This code, as I said, is very slow and does not work properly. So what am I doing wrong? What would be the best way to optimize this and remove NumPy calls?
source share