Efficient pixel processing + neighborhood in numpy image

I have a scene image. I move the image and calculate the average change in depth in the detection window. Detection windows resize based on the average depth of the surrounding pixels of the current location. I am accumulating an average change to create a simple response image.

Most of the time is spent in a for loop, it takes about 40 + s for a 512x52 image on my machine. I was hoping for some acceleration. Is there a more efficient / faster way to move the image? Is there a better way for pythonic / numpy / scipy to visit each pixel? Or am I going to learn cython?

EDIT: I reduced the run time to 18 with scipy.misc.imread () instead of skimage.io.imread (). I do not know what the difference is, I will try to investigate.

Here is a simplified version of the code:

import matplotlib.pylab as plt import numpy as np from skimage.io import imread from skimage.transform import integral_image, integrate import time def intersect(a, b): '''Determine the intersection of two rectangles''' rect = (0,0,0,0) r0 = max(a[0],b[0]) c0 = max(a[1],b[1]) r1 = min(a[2],b[2]) c1 = min(a[3],b[3]) # Do we have a valid intersection? if r1 > r0 and c1 > c0: rect = (r0,c0,r1,c1) return rect # Setup data depth_src = imread("test.jpg", as_grey=True) depth_intg = integral_image(depth_src) # integrate to find sum depth in region depth_pts = integral_image(depth_src > 0) # integrate to find num points which have depth boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with # Image to accumulate response out_img = np.zeros(depth_src.shape) # Average dimensions of bbox/detection window per unit length of depth model = (0.602,2.044) # width, height start_time = time.time() for (r,c), junk in np.ndenumerate(depth_src): # Find points around current pixel r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary) # Calculate average of depth of points around current pixel scale = integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 # Based on average depth, create the detection window r0 = r - (model[0] * scale/2) c0 = c - (model[1] * scale/2) r1 = r + (model[0] * scale/2) c1 = c + (model[1] * scale/2) # Used scale optimised detection window to extract features r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary) depth_count = integrate(depth_pts,r0,c0,r1,c1) if depth_count: depth_sum = integrate(depth_intg,r0,c0,r1,c1) avg_change = depth_sum / depth_count # Accumulate response out_img[r0:r1,c0:c1] += avg_change print time.time() - start_time, " seconds" plt.imshow(out_img) plt.gray() plt.show() 
+4
source share
1 answer
Michael, an interesting question. It seems that the main performance problem is that each pixel in the image has two integrate () functions calculated on it, one of which is 3x3 in size, and the other is unknown in advance. Computing individual integrals in this way is extremely inefficient, no matter what functions you use; this is an algorithmic problem, not an implementation problem. Consider an image of size NN. You can calculate all the integrals of any KK size in this image using only about 4 * NN operations, and not (as you might naively expect) NNKK. The way you do this first calculates the image of the sliding amounts over the window K in each row, and then the sliding sums for the result in each column. Updating each rolling sum to move to the next pixel requires adding only a new pixel to the current window and subtracting the oldest pixel in the previous window, thus, two operations per pixel, regardless of the size of the window. We have to do this twice (for rows and columns), therefore 4 operations per pixel.

I'm not sure if there is a sliding window sum built into numpy, but this answer offers several ways to do this using step tricks: fooobar.com/questions/1447662 / .... You can, of course, do the same thing with one loop over columns and one loop over rows (by taking slices to extract a row / column).

Example:

 # img is a 2D ndarray # K is the size of sums to calculate using sliding window row_sums = numpy.zeros_like(img) for i in range( img.shape[0] ): if i > K: row_sums[i,:] = row_sums[i-1,:] - img[iK-1,:] + img[i,:] elif i > 1: row_sums[i,:] = row_sums[i-1,:] + img[i,:] else: # i == 0 row_sums[i,:] = img[i,:] col_sums = numpy.zeros_like(img) for j in range( img.shape[1] ): if j > K: col_sums[:,j] = col_sums[:,j-1] - row_sums[:,jK-1] + row_sums[:,j] elif j > 1: col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j] else: # j == 0 col_sums[:,j] = row_sums[:,j] # here col_sums[i,j] should be equal to numpy.sum(img[iK:i, jK:j]) if i >=K and j >= K # first K rows and columns in col_sums contain partial sums and can be ignored 

How do you best apply this to your case? I think you may need to pre-calculate the integrals for 3x3 (average depth), as well as for several large sizes, and use the 3x3 value to select one of the large sizes for the detection window (provided that I understand the intent of your algorithm). The range of large sizes you need may be limited, or the artificial limitation of this may still work reasonably well, just select the closest size. Computing all the integrals together using moving sums is much more efficient, and I'm pretty sure it is worth calculating them for a large number of sizes that you will never use in a particular pixel, especially if some of the sizes are large.

PS This is a minor addition, but you can avoid calling intersect () for each pixel: either (a) process only process pixels that are farther from the edge than the maximum integral size, or (b) add fields to the image with the maximum integral size from all sides, filling fields with zeros or nans or (c) (best approach), using slices to take care of this automatically: the slice index outside the border ndarray is automatically limited to the border, except of course, negative indices are wrapped in circle.

EDIT: Added example of sliding window sums

+3
source

Source: https://habr.com/ru/post/1447660/


All Articles