Jaime may have given a good answer, but I will comment on the improvement of the Cython code and the addition of performance comparison.
First, you must use the annotate function, cython -a filename.pyx , this will create an HTML file. Download this in your browser and highlight the โslowโ lines in yellow-orange, this indicates where you can improve.
In the annotate, two things are immediately discovered that are easily fixed:
Convert idioms into things cython understands
First, these lines are slow:
for xDiff in [-1,0,1]: for yDiff in [-1,0,1]: for zDiff in [-1,0,1]:
The reason for this is that Cython does not know how to convert list iteration to pure c-code. It should be turned into equivalent code that Cython can optimize, namely the โin rangeโ form:
for xDiff in range(-1, 2): for yDiff in range(-1, 2): for zDiff in range(-1, 2):
Array Type for Quick Indexing
The next thing this line is slow:
if array[x1,y1,z1]:
The reason for this is that array not a type. Because of this, it uses python level indexing rather than level indexing. To fix this, you need to specify an array of type, this can be done as follows:
def sphericity(np.ndarray[np.uint8_t, ndim=3] array):
It is assumed that the array type is "uint8", replace it with the appropriate type (Note: Cython does not support the type "np.bool", so I use "uint8")
You can also use a memory view, you cannot use numpy functions in a memory view, but you can create a view on an array and then index the view instead of an array:
cdef np.uint8_t array_view [:, :, :] = array ... if array_view[x1,y1,z1]:
It looks like the kind of memory will be a little faster and make a clear separation between the array (python level calls) and browsing (c level calls). If you do not use numpy functions, you can use memory without problems.
Modify the code to avoid multiple passes through the array
It remains that the calculations of index and nonZeroCount slow, for various reasons, but mainly only refers to the pure size of the data (in fact, iterating over 512 * 512 * 512 elements takes time!) Generally speaking, something that Numpy can do optimized Cython can do faster (usually 2-10 times faster) - numpy just saves you a lot of reinventing the wheel and typing a lot and allows you to think at a higher level (and if you're not also an AC programmer, you might not be able to optimize cython good enough). But in this case itโs easy, you can simply exclude index and nonZeroCount and all related code and just do it:
for x in range(0, xSize): for y in range(0, ySize): for z in range(0, zSize): if array[x,y,z] == 0: continue ...
This is very fast, since c (which is pure Cython compiles to an impeccable level) has no problem doing billions of operations per second. Excluding the index and nonZeroCount , you essentially save two whole iterations over the entire array, which even at maximum speed require a minimum of about 0.1 seconds each. Even more important is processor caching, the entire array is 128 MB, much larger than the cache processor, so all in one pass uses the processor cache better (multiple passes will not matter much if the arrays fit fully into the CPU cache).
Optimized version
Here is the complete code for my optimized version:
#cython: boundscheck=False, nonecheck=False, wraparound=False import numpy as np cimport numpy as np def sphericity2(np.uint8_t [:, :, :] array): #Pass an mask array (1 are marked, 0 ignored) cdef int xSize,ySize,zSize xSize=array.shape[0] ySize=array.shape[1] zSize=array.shape[2] cdef int sa,vol,x,y,z,neighbors,xDiff,yDiff,zDiff,x1,y1,z1 cdef float onethird,twothirds,sp sa=vol=0 #keep running tally of volume and surface area for x in range(0, xSize): for y in range(0, ySize): for z in range(0, zSize): if array[x,y,z] == 0: continue neighbors=0 vol+=1 for xDiff in range(-1, 2): for yDiff in range(-1, 2): for zDiff in range(-1, 2): if abs(xDiff)+abs(yDiff)+abs(zDiff)==1: x1=x+xDiff y1=y+yDiff z1=z+zDiff if x1>=0 and y1>=0 and z1>=0 and x1<xSize and y1<ySize and z1<zSize: #print '-',x1,y1,z1,array[x1,y1,z1] if array[x1,y1,z1]: #print '-',x1,y1,z1,array[x1,y1,z1] neighbors+=1 #print 'had this many neighbors',neighbors sa+=(6-neighbors) onethird=float(1)/float(3) twothirds=float(2)/float(3) sph = ((np.pi**onethird)*((6*vol)**twothirds)) / sa #print 'sphericity',sphericity return sph
Sphericity Runtime Comparison:
Original: 2.123s
Jaime: 1.819s
Optimized Cython: 0.136s
@ moarningsun: 0.090s
Throughout the solution, Cython runs more than 15 times faster, with deployed internal loops (see comment), it runs more than 23 times.