Latitude / Longitude Based JOIN

Question

Latitude / Longitude Based JOIN

Given the following tables:

table A (id, latitude, longitude) table B (id, latitude, longitude)

How to create an effective T-SQL query that associates each row in with the nearest row in B?

The ResultSet should contain all the rows in and associate them with 1 and only 1 element in B. The format I'm looking for is the following:

 (A.id, B.id, distanceAB)

I have a function that calculates the distance given by two pairs of latitude and longitude. I tried something using order by ... limit 1 and / or rank() over (partition by ...) as rowCount ... where rowCount = 1 , but the result is either not quite what I need, or too long to return.

Am I missing something?

+4

sql join tsql haversine distance

Gevorg Jan 20 '12 at 21:20

source share

3 answers

Chad · Answer 1 · 2012-01-21T02:14:12+0000

It is impossible to get around the fact that you have to compare each entry in with each entry in B, which obviously will not scale well if both A and B contain many entries.

This will return the correct results:

 SELECT aid, bid, distanceAB FROM ( SELECT aid, bid, distanceAB, dense_rank() over (partition by aid order by distanceAB) as n FROM ( SELECT a.id as aid, B.id as bid, acos(sin(radians(A.lat)) * sin(radians(B.lat)) + cos(radians(A.lat)) * cos(radians(B.lat)) * cos(radians(A.lon - B.lon))) * 6372.8 as distanceAB FROM A cross join B ) C ) D WHERE n = 1

This will be returned within a reasonable time if your sets are not too large. With 3 places in and 130,000 or so in B, it takes about one second on my car. 1000 entries in each takes about 40 seconds. As I said, it does not scale well.

It should be noted that Sparky's answer may lead to incorrect results in certain circumstances. Suppose your location A is at + 40, + 100. + 40, + 111 will not be returned, although it is closer than + 49, + 109.

Sparky · Answer 2 · 2012-01-20T23:23:05+0000

This is one approach that should have inefficient performance, but the big caveat is that it cannot find any results.

  select top 1 a.id,b.id,dbo.yourFunction() as DistanceAB from a join b on b.latitude between a.latitude-10 and a.latitude+10 and b.longititude between a.longitude-10 and b.longittude+10 order by 3

What you basically do is look for any string B within about 20 units of radius A, and then sort it by your function to determine the closest. You can adjust the radius of the block as needed. Although this is not accurate, it should reduce the size of the result set and give you decent results.

tpolyak · Answer 3 · 2012-01-20T23:43:47+0000

This is possible with the union of two subqueries. The first contains all the distances between points A and B, the second contains only the minimum distance from B locations from locations A.

 SELECT x.aid, x.bid, x.distance FROM (SELECT A.ID AS aid, B.ID AS bid, SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude) AS Distance FROM LocationsA AS A CROSS JOIN LocationsB AS B) x JOIN (SELECT A.ID AS aid, MIN(SQRT(A.Latitude * A.Latitude + B.Longitude * B.Longitude)) AS Distance FROM LocationsA AS A CROSS JOIN LocationsB AS B GROUP BY A.ID) y ON x.aid = y.aid AND x.Distance = y.Distance

Latitude / Longitude Based JOIN

More articles: