I have two separate data sets, dfand df2, each data set has columns longitudeand latitude. What I'm trying to do is find the point in dfwhich is closest to the point in df2, and calculate the distance between them in kmand add each value to a new column in df2.
I came up with a solution, but keep in mind that it dfhas strings +700,000and df2has about 60,000strings, so my solution will take too long to calculate. The only solution I could come up with is to use a double loop for...
def compute_shortest_dist(df, df2):
shortest_dist = []
R = 6373.0
for i in df2.index:
min_dist = -1
lat1 = df2.ix[i]['Latitude']
lon1 = df2.ix[i]['Longitude']
for j in df.index:
lat2 = df.ix[j]['Latitude']
lon2 = df.ix[j]['Longitude']
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = R * c
if min_dist == -1 or distance > min_dist:
min_dist = distance
shortest_dist.append(min_dist)
, , , pandas.
.