Nearest adjacent one-dimensional data with a given range

Question

Nearest adjacent one-dimensional data with a given range

I have two nested lists A and B:

A = [[50,140],[51,180],[54,500],......]

B = [[50.1, 170], [51,200],[55,510].....]

The 1st element in each internal list runs from 0 to about 1e5, the 0th element runs from 50 to 700, these elements are not sorted. What I want to do is run through each element in [n] [1] and find the closest element in B [n] [1], but when looking for the nearest neighbor I only want to search in the interval defined by A [n] [0 ] plus or minus 0.5.

I used the function:

 def find_nearest_vector(array, value): idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin() return array[idx]

What does the nearest neighbor find between the coordinates A[0][:] and B[0][:] , for example. However, I need to limit the search range to a rectangle around a small shift in the value A [0] [0]. In addition, I do not want to reuse the elements - I want a unique bijection between each value of A [n] [1] and B [n] [1] p between the interval A [n] [0] +/- 0.5 >

I am trying to use Scipy KDTree, but this repeats the elements and I don’t know how to limit the search range. In fact, I want to do a one-dimensional search for NNN in a two-dimensional nested list along a certain axis, where the neighborhood in which the NNN search is inside the hyper-rectangle defined by the 0th element in each internal list is plus or minus a slight shift.

+6

python algorithm numpy scipy nearest-neighbor

Holtz 15 sept. '13 at 20:12

source share

2 answers

My understanding of your problem is that you are trying to find the closest elements for each A[n][1] in a different set of points ( B[i][1] limited to points if A[n][0] is in within +/- 0.5 from B[i][0] ).

(I'm not familiar with numpy or scipy, and I'm sure there is a better way to do this with their algorithms.)

Saying this, here is my naive implementation in O(a*b*log(a*b)) time.

 def main(a,b): for a_bound,a_val in a: dist_to_valid_b_points = {abs(a_val-b_val):(b_bound,b_val) for b_bound,b_val in b if are_within_bounds(a_bound,b_bound)} print get_closest_point((a_bound, a_val),dist_to_valid_b_points) def are_within_bounds(a_bound, b_bound): return abs(b_bound-a_bound) < 0.5 def get_closest_point(a_point, point_dict): return (a_point, None if not point_dict else point_dict[min(point_dict, key=point_dict.get)])

main([[50,140],[51,180],[54,500]],[[50.1, 170], [51,200],[55,510]]) displays the following result:

 ((50, 140), (50.1, 170)) ((51, 180), (51, 200)) ((54, 500), None)

0

Steven liao 15 sept. '13 at 21:04

source share

Hyry · Accepted Answer · 2013-09-16T00:56:53+0000

I use numpy.argsort() , numpy.searchsorted() , numpy.argmin() to search.

 %pylab inline import numpy as np np.random.seed(0) A = np.random.rand(5, 2) B = np.random.rand(100, 2) xaxis_range = 0.02 order = np.argsort(B[:, 0]) bx = B[order, 0] sidx = np.searchsorted(bx, A[:, 0] - xaxis_range, side="right") eidx = np.searchsorted(bx, A[:, 0] + xaxis_range, side="left") result = [] for s, e, ay in zip(sidx, eidx, A[:, 1]): section = order[s:e] by = B[section, 1] idx = np.argmin(np.abs(ay-by)) result.append(B[section[idx]]) result = np.array(result)

I will construct the result as follows:

 plot(A[:, 0], A[:, 1], "o") plot(B[:, 0], B[:, 1], ".") plot(result[:, 0], result[:, 1], "x")

output:

Nearest adjacent one-dimensional data with a given range

More articles: