Haversin vector formula with pandas framework

I know that to find the distance between two latitudes, longitude points, I need to use the haversine function:

def haversine(lon1, lat1, lon2, lat2): lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * asin(sqrt(a)) km = 6367 * c return km 

I have a DataFrame where one column is latitude and another column is longitude. I want to know how far these points are from the given value, -56.7213600, 37.2175900. How to take values ​​from a DataFrame and put them in a function?

Example DataFrame:

  SEAZ LAT LON 1 296.40, 58.7312210, 28.3774110 2 274.72, 56.8148320, 31.2923240 3 192.25, 52.0649880, 35.8018640 4 34.34, 68.8188750, 67.1933670 5 271.05, 56.6699880, 31.6880620 6 131.88, 48.5546220, 49.7827730 7 350.71, 64.7742720, 31.3953780 8 214.44, 53.5192920, 33.8458560 9 1.46, 67.9433740, 38.4842520 10 273.55, 53.3437310, 4.4716664 
+5
source share
1 answer

I can’t confirm the correctness of the calculations, but the following worked:

 In [11]: from numpy import cos, sin, arcsin, sqrt from math import radians def haversine(row): lon1 = -56.7213600 lat1 = 37.2175900 lon2 = row['LON'] lat2 = row['LAT'] lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * arcsin(sqrt(a)) km = 6367 * c return km df['distance'] = df.apply(lambda row: haversine(row), axis=1) df Out[11]: SEAZ LAT LON distance index 1 296.40 58.731221 28.377411 6275.791920 2 274.72 56.814832 31.292324 6509.727368 3 192.25 52.064988 35.801864 6990.144378 4 34.34 68.818875 67.193367 7357.221846 5 271.05 56.669988 31.688062 6538.047542 6 131.88 48.554622 49.782773 8036.968198 7 350.71 64.774272 31.395378 6229.733699 8 214.44 53.519292 33.845856 6801.670843 9 1.46 67.943374 38.484252 6418.754323 10 273.55 53.343731 4.471666 4935.394528 

The following code is actually slower on such a small data frame, but I applied it to a df of 100,000 lines:

 In [35]: %%timeit df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LON']) df['dLON'] = df['LON_rad'] - math.radians(-56.7213600) df['dLAT'] = df['LAT_rad'] - math.radians(37.2175900) df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(math.radians(37.2175900)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2)) 1 loops, best of 3: 17.2 ms per loop 

Compared to the apply function, which consumed 4.3 seconds, which is almost 250 times faster, which is worth paying attention to in the future

If we compress all of the above into one line:

 In [39]: %timeit df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin((np.radians(df['LAT']) - math.radians(37.21759))/2)**2 + math.cos(math.radians(37.21759)) * np.cos(np.radians(df['LAT']) * np.sin((np.radians(df['LON']) - math.radians(-56.72136))/2)**2))) 100 loops, best of 3: 12.6 ms per loop 

We observe further accelerations ~ 341 times faster.

+19
source

Source: https://habr.com/ru/post/1202243/


All Articles