I am trying to speed up comparison between dates using Cython when an array of numpy datetimes (or details sufficient to create datetimes) is passed. To get started, I tried to understand how Cython will speed up the comparison of integers.
testArrayInt = np.load("testArray.npy")
Python Method:
def processInt(array):
compareSuccess = 0
testValue = 1
for counter in range(testArrayInt.shape[0]):
if testValue > testArrayInt[counter]:
compareSuccess+=1
print compareSuccess
Cython Method:
def processInt(np.ndarray[np.int_t,ndim=1] array):
cdef int rows = array.shape[0]
cdef int counter = 0
cdef int compareSuccess = 0
for counter in range(rows):
if testInt > array[counter]:
compareSuccess = compareSuccess+1
print compareSuccess
Comparing time with an array of numpy strings of 1,000,000:
Python: 0.204969 seconds
Cython: 0.000826 seconds
Speedup: 250 times approx.
Repeating the same exercise with dates: Since cython did not accept the datetime array, I split and sent the array from year, month, and day for both methods.
testArrayDateTime = np.load("testArrayDateTime.npy")
Python Code:
def processDateTime(array):
compareSuccess = 0
d = datetime(2009,1,1)
rows = array.shape[0]
for counter in range(rows):
dTest = datetime(array[counter][0],array[counter][1],array[counter][2])
if d>dTest:
compareSuccess+=1
print compareSuccess
Cython Code:
from cpython.datetime cimport date
def processDateTime(np.ndarray[np.int_t, ndim=2] array):
cdef int compareSuccess = 0
cdef int rows = avlDates.shape[0]
cdef int counter = 0
for counter in range(rows):
dTest = date(array[counter,0],array[counter,1],array[counter,2])
if dTest>d:
compareSuccess=compareSuccess+1
print compareSuccess
Performance:
Python: 0.865261 seconds
Cython: 0.162297 seconds
Speedup: 5 times approx.
Why is the speed so low? And what is a possible way to increase this?