Find the closest stretch using numpy

I have 2 sets of geocodes like the pandas series, and I'm trying to find the fastest way to get the minimum Euclidean distance of points in set A from points in set B. That is: the closest point to 40.748043 and -73.992953 from the second set, etc. I would really appreciate any suggestions / help.

Set A: print(latitude1) print(longitude1) 0 40.748043 1 42.361016 Name: latitude, dtype: float64 0 -73.992953 1 -71.020005 Name: longitude, dtype: float64 Set B: print(latitude2) print(longitude2) 0 42.50729 1 42.50779 2 25.56473 3 25.78953 4 25.33132 5 25.06570 6 25.59246 7 25.61955 8 25.33737 9 24.11028 Name: latitude, dtype: float64 0 1.53414 1 1.52109 2 55.55517 3 55.94320 4 56.34199 5 55.17128 6 56.26176 7 56.27291 8 55.41206 9 52.73056 Name: longitude, dtype: float64 
+5
source share
3 answers

This is one way using only numpy.linalg.norm .

 import pandas as pd, numpy as np df1['coords1'] = list(zip(df1['latitude1'], df1['longitude1'])) df2['coords2'] = list(zip(df2['latitude2'], df2['longitude2'])) def calc_min(x): amin = np.argmin([np.linalg.norm(np.array(x)-np.array(y)) for y in df2['coords2']]) return df2['coords2'].iloc[amin] df1['closest'] = df1['coords1'].map(calc_min) # latitude1 longitude1 coords1 closest # 0 40.748043 -73.992953 (40.748043, -73.992953) (42.50779, 1.52109) # 1 42.361016 -71.020005 (42.361016, -71.020005) (42.50779, 1.52109) # 2 25.361016 54.000000 (25.361016, 54.0) (25.0657, 55.17128) 

Customization

 from io import StringIO mystr1 = """latitude1|longitude1 40.748043|-73.992953 42.361016|-71.020005 25.361016|54.0000 """ mystr2 = """latitude2|longitude2 42.50729|1.53414 42.50779|1.52109 25.56473|55.55517 25.78953|55.94320 25.33132|56.34199 25.06570|55.17128 25.59246|56.26176 25.61955|56.27291 25.33737|55.41206 24.11028|52.73056""" df1 = pd.read_csv(StringIO(mystr1), sep='|') df2 = pd.read_csv(StringIO(mystr2), sep='|') 

If performance is a problem, you can easily draw this calculation through numpy base arrays.

+2
source

You can try using the geophysical library.

https://pypi.python.org/pypi/geopy

Here is an example from the documentation.

 >>> from geopy.distance import vincenty >>> newport_ri = (41.49008, -71.312796) >>> cleveland_oh = (41.499498, -81.695391) >>> print(vincenty(newport_ri, cleveland_oh).miles) 538.3904451566326 

where vincenty - distance vincenty

https://en.wikipedia.org/wiki/Vincenty%27s_formulae

+2
source

For those closest calculations, as a rule, an effective method is associated with one of such quick searches for the nearest neighbor based on the kd-tree. Using the Cython-powered implementation , we will have one approach, for example:

 from scipy.spatial import cKDTree def closest_pts(setA_lat, setA_lng, setB_lat, setB_lng): a_x = setA_lat.values a_y = setA_lng.values b_x = setB_lat.values b_y = setB_lng.values a = np.c_[a_x, a_y] b = np.c_[b_x, b_y] indx = cKDTree(b).query(a,k=1)[1] return pd.Series(b_x[indx]), pd.Series(b_y[indx]) 

Run Example -

1) Inputs:

 In [106]: setA_lat Out[106]: 0 40.748043 1 42.361016 dtype: float64 In [107]: setA_lng Out[107]: 0 -73.992953 1 -71.020005 dtype: float64 In [108]: setB_lat Out[108]: 0 42.460000 1 0.645894 2 0.437587 3 40.460000 4 0.963663 dtype: float64 In [109]: setB_lng Out[109]: 0 -71.000000 1 0.925597 2 0.071036 3 -72.000000 4 0.020218 dtype: float64 

2) Outputs:

 In [110]: c_x,c_y = closest_pts(setA_lat, setA_lng, setB_lat, setB_lng) In [111]: c_x Out[111]: 0 40.46 1 42.46 dtype: float64 In [112]: c_y Out[112]: 0 -72.0 1 -71.0 dtype: float64 
+1
source

Source: https://habr.com/ru/post/1275921/


All Articles