Michael, an interesting question. It seems that the main performance problem is that each pixel in the image has two integrate () functions calculated on it, one of which is 3x3 in size, and the other is unknown in advance. Computing individual integrals in this way is extremely inefficient, no matter what functions you use; this is an algorithmic problem, not an implementation problem. Consider an image of size NN. You can calculate all the integrals of any KK size in this image using only about 4 * NN operations, and not (as you might naively expect) NNKK. The way you do this first calculates the image of the sliding amounts over the window K in each row, and then the sliding sums for the result in each column. Updating each rolling sum to move to the next pixel requires adding only a new pixel to the current window and subtracting the oldest pixel in the previous window, thus, two operations per pixel, regardless of the size of the window. We have to do this twice (for rows and columns), therefore 4 operations per pixel.
I'm not sure if there is a sliding window sum built into numpy, but this answer offers several ways to do this using step tricks: fooobar.com/questions/1447662 / .... You can, of course, do the same thing with one loop over columns and one loop over rows (by taking slices to extract a row / column).
Example:
# img is a 2D ndarray # K is the size of sums to calculate using sliding window row_sums = numpy.zeros_like(img) for i in range( img.shape[0] ): if i > K: row_sums[i,:] = row_sums[i-1,:] - img[iK-1,:] + img[i,:] elif i > 1: row_sums[i,:] = row_sums[i-1,:] + img[i,:] else: # i == 0 row_sums[i,:] = img[i,:] col_sums = numpy.zeros_like(img) for j in range( img.shape[1] ): if j > K: col_sums[:,j] = col_sums[:,j-1] - row_sums[:,jK-1] + row_sums[:,j] elif j > 1: col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j] else: # j == 0 col_sums[:,j] = row_sums[:,j] # here col_sums[i,j] should be equal to numpy.sum(img[iK:i, jK:j]) if i >=K and j >= K # first K rows and columns in col_sums contain partial sums and can be ignored
How do you best apply this to your case? I think you may need to pre-calculate the integrals for 3x3 (average depth), as well as for several large sizes, and use the 3x3 value to select one of the large sizes for the detection window (provided that I understand the intent of your algorithm). The range of large sizes you need may be limited, or the artificial limitation of this may still work reasonably well, just select the closest size. Computing all the integrals together using moving sums is much more efficient, and I'm pretty sure it is worth calculating them for a large number of sizes that you will never use in a particular pixel, especially if some of the sizes are large.
PS This is a minor addition, but you can avoid calling intersect () for each pixel: either (a) process only process pixels that are farther from the edge than the maximum integral size, or (b) add fields to the image with the maximum integral size from all sides, filling fields with zeros or nans or (c) (best approach), using slices to take care of this automatically: the slice index outside the border ndarray is automatically limited to the border, except of course, negative indices are wrapped in circle.
EDIT: Added example of sliding window sums