Find closest point indices efficiently on a non-rectangular 2D grid

I have an irregular (non-rectangular) lon / lat grid and a bunch of points in lon / lat coordinates that should correspond to points on the grid (although they can be slightly disabled for numerical reasons). Now I need the indices of the corresponding lon / lat points.

I wrote a function that does this, but it is REALLY slow.

def find_indices(lon,lat,x,y): lonlat = np.dstack([lon,lat]) delta = np.abs(lonlat-[x,y]) ij_1d = np.linalg.norm(delta,axis=2).argmin() i,j = np.unravel_index(ij_1d,lon.shape) return i,j ind = [find_indices(lon,lat,p*) for p in points] 

I am sure there is a better (and faster) solution in numpy / scipy. I already google a lot, but the answer so far has eluded me.

Any suggestions on how to effectively search the indices of the corresponding (nearest) points?

PS: This question came from another ( click ).

Edit: solution

Based on @Cong Ma's answer, I found the following solution:

 def find_indices(points,lon,lat,tree=None): if tree is None: lon,lat = lon.T,lat.T lonlat = np.column_stack((lon.ravel(),lat.ravel())) tree = sp.spatial.cKDTree(lonlat) dist,idx = tree.query(points,k=1) ind = np.column_stack(np.unravel_index(idx,lon.shape)) return [(i,j) for i,j in ind] 

To put this solution, as well as one of the Divakar answers in perspective, here are some timings of the function in which I use find_indices (and where is the bottleneck in speed) (see link above):

 spatial_contour_frequency/pil0 : 331.9553 spatial_contour_frequency/pil1 : 104.5771 spatial_contour_frequency/pil2 : 2.3629 spatial_contour_frequency/pil3 : 0.3287 

pil0 is my initial approach, pil1 Divakar and pil2 / pil3 final solution is above, where the tree is created on the fly in pil2 (i.e. for each iteration the loop in which find_indices is find_indices ), and only once in pil3 (for more details see another flow). Although the Divakar enhancement of my initial approach gives me 3x speedup, cKDTree takes it to a whole new level with another 50x speedup! And moving the creation of a tree from a function makes things even faster.

+5
source share
2 answers

If the points are sufficiently localized, you can try directly the scipy.spatial cKDTree implementation, as I discussed in another post . This post was about interpolation, but you can ignore it and just use the request part.

tl; dr version:

Read the scipy.sptial.cKDTree documentation. Create a tree by passing an initialization object (n, m) -shaped numpy ndarray, and the tree will be created from n m dimensional coordinates.

 tree = scipy.spatial.cKDTree(array_of_coordinates) 

After that, use tree.query() to retrieve the k th nearest neighbor (possibly with approximation and parallelization, see documents) or use tree.query_ball_point() to find all neighbors within the given distance tolerance.

If the points are weakly localized, and the spherical curvature / nontrivial topology works, you can try to divide the manifold into several parts, each of which is small enough to be considered local.

+4
source

Here's a generic vector approach using scipy.spatial.distance.cdist -

 import scipy # Stack lon and lat arrays as columns to form a Nx2 array, where is N is grid**2 lonlat = np.column_stack((lon.ravel(),lat.ravel())) # Get the distances and get the argmin across the entire N length idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0) # Get the indices corresponding to grid shape as the final output ind = np.column_stack((np.unravel_index(idx,lon.shape))).tolist() 

Run Example -

 In [161]: lon Out[161]: array([[-11. , -7.82 , -4.52 , -1.18 , 2.19 ], [-12. , -8.65 , -5.21 , -1.71 , 1.81 ], [-13. , -9.53 , -5.94 , -2.29 , 1.41 ], [-14.1 , -0.04 , -6.74 , -2.91 , 0.976]]) In [162]: lat Out[162]: array([[-11.2 , -7.82 , -4.51 , -1.18 , 2.19 ], [-12. , -8.63 , -5.27 , -1.71 , 1.81 ], [-13.2 , -9.52 , -5.96 , -2.29 , 1.41 ], [-14.3 , -0.06 , -6.75 , -2.91 , 0.973]]) In [163]: lonlat = np.column_stack((lon.ravel(),lat.ravel())) In [164]: idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0) In [165]: np.column_stack((np.unravel_index(idx,lon.shape))).tolist() Out[165]: [[0, 4], [0, 4], [0, 4], [0, 4], [0, 4], [0, 4], [3, 3]] 

Runtime Tests -

Define functions:

 def find_indices(lon,lat,x,y): lonlat = np.dstack([lon,lat]) delta = np.abs(lonlat-[x,y]) ij_1d = np.linalg.norm(delta,axis=2).argmin() i,j = np.unravel_index(ij_1d,lon.shape) return i,j def loopy_app(lon,lat,pts): return [find_indices(lon,lat,pts[i,0],pts[i,1]) for i in range(pts.shape[0])] def vectorized_app(lon,lat,points): lonlat = np.column_stack((lon.ravel(),lat.ravel())) idx = scipy.spatial.distance.cdist(lonlat,points).argmin(0) return np.column_stack((np.unravel_index(idx,lon.shape))).tolist() 

Timings:

 In [179]: lon = np.random.rand(100,100) In [180]: lat = np.random.rand(100,100) In [181]: points = np.random.rand(50,2) In [182]: %timeit loopy_app(lon,lat,points) 10 loops, best of 3: 47 ms per loop In [183]: %timeit vectorized_app(lon,lat,points) 10 loops, best of 3: 16.6 ms per loop 

To extrude more performance, np.concatenate can be used instead of np.column_stack .

+1
source

Source: https://habr.com/ru/post/1232803/


All Articles