Python - combining argsort with masking to get closest values ​​in a moving window

I have code for calculating missing values ​​in an image based on neighboring values ​​in a two-dimensional circular window. It also uses values ​​from one or more temporarily adjacent images in the same places (i.e., the same 2D window shifted in the 3rd dimension).

For each missing position, I need to calculate a value based not necessarily on all the values ​​available in the whole window, but only on the n cells closest to the space that have values ​​(in both images / on the Z axis of the position), where n is some value, less than the total number of cells in a 2D window.

At the same time, it’s much faster to calculate everything in the window, because my sorting tools to get the closest n cells with data are the slowest part of the function, because it needs to be repeated every time, even if the distances from the point of view of the coordinates of the window do not change. I’m not sure if this is necessary, and I feel that I have to be able to sort the distances once, and then mask those that are in the process of selecting only the available cells.

Here is my code to select the data to use in the slot cell window:

# radius will in reality be ~100 radius = 2 y,x = np.ogrid[-radius:radius+1, -radius:radius+1] dist = np.sqrt(x**2 + y**2) circle_template = dist > radius # this will in reality be a very large 3 dimensional array # representing daily images with some gaps, indicated by 0s dataStack = np.zeros((2,5,5)) dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape) dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape) testdata = dataStack[1] alternatedata = dataStack[0] random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata testdata[random_gap_locations] = 0 testdata[radius, radius] = 0 # in reality we will go through every gap (zero) location in the data # for each image and for each gap use slicing to get a window of # size (radius*2+1, radius*2+1) around it from each image, with the # gap being at the centre ie # testgaplocation = [radius, radius] # and the variables testdata, alternatedata below will refer to these # slices locations_to_exclude = np.logical_or(circle_template, np.logical_or (testdata==0, alternatedata==0)) # the places that are inside the circular mask and where both images # have data locations_to_include = ~locations_to_exclude number_available = np.count_nonzero(locations_to_include) # we only want to do the interpolation calculations from the nearest n # locations that have data available, n will be ~100 in reality number_required = 3 available_distances = dist[locations_to_include] available_data = testdata[locations_to_include] available_alternates = alternatedata[locations_to_include] if number_available > number_required: # In this case we need to find the closest number_required of elements, based # on distances recorded in dist, from available_data and available_alternates # Having to repeat this argsort for each gap cell calculation is slow and feels # like it should be avoidable sortedDistanceIndices = available_distances.argsort(kind = 'mergesort',axis=None) requiredIndices = sortedDistanceIndices[0:number_required] selected_data = np.take(available_data, requiredIndices) selected_alternates = np.take(available_alternates , requiredIndices) else: # we just use available_data and available_alternates as they are... # now do stuff with the selected data to calculate a value for the gap cell 

This works, but more than half of the total function time is taken in argsort masked spatial distance data. (~ 900uS in just 1.4 ms - and this function will work tens of billions of times, so this is an important difference!)

I am sure that I should be able to simply execute this argument once outside the function when the spatial distance window is initially set up and then include these sorting indexes in the mask to get the first howManyToCalculate indexes without having to re-sort. The answer may be to put the various bits from which we extract into an array of records, but I cannot figure out how to do this. Can anyone see how I can make this part of the process more efficient?

+5
source share
2 answers

So you want to do sorting out of loop:

 sorted_dist_idcs = dist.argsort(kind='mergesort', axis=None) 

Then, using some variables from the source code, this is what I could come up with, although it still feels like a big round.

 loc_to_incl_sorted = locations_to_include.take(sorted_dist_idcs) sorted_dist_idcs_to_incl = sorted_dist_idcs[loc_to_incl_sorted] required_idcs = sorted_dist_idcs_to_incl[:number_required] selected_data = testdata.take(required_idcs) selected_alternates = alternatedata.take(required_idcs) 

Note that required_idcs refer to locations in testdata , not available_data , as in the source code. And I used take this snippet for the convenience of indexing a flattened array.

+1
source

@moarningsun - thanks for the comment and response. This made me on the right track, but it doesn’t work for me when the space is <radius from the data edge: in this case I use the window around the gap cell, which is “cropped” for the data borders. In this situation, the indices reflect a “full” window and therefore cannot be used to select cells from a restricted window.

Unfortunately, I edited this part of my code when I clarified the original question, but it turned out to be useful.

Now I realized that if you use argsort again on the output of argsort , then you will get ranks; that is, the position that each element will have when the shared array has been sorted. We can hide them, and then take the smallest number_required from them (and do this on a structured array to get the corresponding data at the same time).

This implies a different type in the loop, but in fact we can use markup rather than full sorting, because all we need is the smallest num_required elements. If num_required significantly less than the number of data elements, then this is much faster than executing argsort .

For example, with num_required = 80 and num_available = 15000, a full argsort takes ~ 900 μs, while an argpartition , followed by an index and a slice, takes ~ 110 μs to get the first 80. We still need to do argsort to get ranks from the very beginning (and not just distance-based partitioning) in order to get merge stability, and thus get “correct” when the distance is not unique.

My code, as shown below, now runs on ~ 610uS on real data, including actual calculations that are not shown here. I am happy with this now, but there seem to be a few other seemingly minor factors that can affect the runtime, which is hard to understand.

For example, putting circle_template in a structured array along with dist , ranks , and another field not shown here doubles the execution time of a common function (even if we don't get access to circle_template in a loop!). Even worse, using np.partition in a structured array with order=['ranks'] increases the total duration of a function by almost two orders of magnitude against using np.argpartition , as shown below!

 # radius will in reality be ~100 radius = 2 y,x = np.ogrid[-radius:radius+1, -radius:radius+1] dist = np.sqrt(x**2 + y**2) circle_template = dist > radius ranks = dist.argsort(axis=None,kind='mergesort').argsort().reshape(dist.shape) diam = radius * 2 + 1 # putting circle_template in this array too doubles overall function runtime! fullWindowArray = np.zeros((diam,diam),dtype=[('ranks',ranks.dtype.str), ('thisdata',dayDataStack.dtype.str), ('alternatedata',dayDataStack.dtype.str), ('dist',spatialDist.dtype.str)]) fullWindowArray['ranks'] = ranks fullWindowArray['dist'] = dist # this will in reality be a very large 3 dimensional array # representing daily images with some gaps, indicated by 0s dataStack = np.zeros((2,5,5)) dataStack[1] = (np.random.random(25) * 100).reshape(dist.shape) dataStack[0] = (np.random.random(25) * 100).reshape(dist.shape) testdata = dataStack[1] alternatedata = dataStack[0] random_gap_locations = (np.random.random(25) * 30).reshape(dist.shape) > testdata testdata[random_gap_locations] = 0 testdata[radius, radius] = 0 # in reality we will loop here to go through every gap (zero) location in the data # for each image gapz, gapy, gapx = 1, radius, radius desLeft, desRight = gapx - radius, gapx + radius+1 desTop, desBottom = gapy - radius, gapy + radius+1 extentB, extentR = dataStack.shape[1:] # handle the case where the gap is < search radius from the edge of # the data. If this is the case, we can't use the full # diam * diam window dataL = max(0, desLeft) maskL = 0 if desLeft >= 0 else abs(dataL - desLeft) dataT = max(0, desTop) maskT = 0 if desTop >= 0 else abs(dataT - desTop) dataR = min(desRight, extentR) maskR = diam if desRight <= extentR else diam - (desRight - extentR) dataB = min(desBottom,extentB) maskB = diam if desBottom <= extentB else diam - (desBottom - extentB) # get the slice that we will be working within # ranks, dist and circle are already populated boundedWindowArray = fullWindowArray[maskT:maskB,maskL:maskR] boundedWindowArray['alternatedata'] = alternatedata[dataT:dataB, dataL:dataR] boundedWindowArray['thisdata'] = testdata[dataT:dataB, dataL:dataR] locations_to_exclude = np.logical_or(boundedWindowArray['circle_template'], np.logical_or (boundedWindowArray['thisdata']==0, boundedWindowArray['alternatedata']==0)) # the places that are inside the circular mask and where both images # have data locations_to_include = ~locations_to_exclude number_available = np.count_nonzero(locations_to_include) # we only want to do the interpolation calculations from the nearest n # locations that have data available, n will be ~100 in reality number_required = 3 data_to_use = boundedWindowArray[locations_to_include] if number_available > number_required: # argpartition seems to be v fast when number_required is # substantially < data_to_use.size # But partition on the structured array itself with order=['ranks'] # is almost 2 orders of magnitude slower! reqIndices = np.argpartition(data_to_use['ranks'],number_required)[:number_required] data_to_use = np.take(data_to_use,reqIndices) else: # we just use available_data and available_alternates as they are... pass # now do stuff with the selected data to calculate a value for the gap cell 
0
source

Source: https://habr.com/ru/post/1204835/


All Articles