Need help optimizing lat / Lon geoinformation for mysql

Question

Need help optimizing lat / Lon geoinformation for mysql

I have a myisam mysql table (5.0.22) with approximately 300 thousand records in it, and I want to search for distances in lat / lon within a radius of five miles.

I have an index that covers lat / lon fields and quickly (milisecond response) when I just select lat / lon. But when I select additional fields in the table, it slows down to 5-8 seconds.

I use myisam to use full-text search. Other indexes work well (for example, select * from the list where slug = 'xxxxx').

How to optimize my query, table or index to speed things up?

My scheme:

CREATE TABLE `Listing` ( `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(125) collate utf8_unicode_ci default NULL, `phone` varchar(18) collate utf8_unicode_ci default NULL, `fax` varchar(18) collate utf8_unicode_ci default NULL, `email` varchar(55) collate utf8_unicode_ci default NULL, `photourl` varchar(55) collate utf8_unicode_ci default NULL, `thumburl` varchar(5) collate utf8_unicode_ci default NULL, `website` varchar(85) collate utf8_unicode_ci default NULL, `categoryid` int(10) unsigned default NULL, `addressid` int(10) unsigned default NULL, `deleted` tinyint(1) default NULL, `status` int(10) unsigned default '2', `parentid` int(10) unsigned default NULL, `organizationid` int(10) unsigned default NULL, `listinginfoid` int(10) unsigned default NULL, `createuserid` int(10) unsigned default NULL, `createdate` datetime default NULL, `lasteditdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, `lastedituserid` int(10) unsigned default NULL, `slug` varchar(155) collate utf8_unicode_ci default NULL, `aclid` int(10) unsigned default NULL, `alt_address` varchar(80) collate utf8_unicode_ci default NULL, `alt_website` varchar(80) collate utf8_unicode_ci default NULL, `lat` decimal(10,7) default NULL, `lon` decimal(10,7) default NULL, `city` varchar(80) collate utf8_unicode_ci default NULL, `state` varchar(10) collate utf8_unicode_ci default NULL, PRIMARY KEY (`id`), KEY `idx_fetch` USING BTREE (`slug`,`deleted`), KEY `idx_loc` (`state`,`city`), KEY `idx_org` (`organizationid`,`status`,`deleted`), KEY `idx_geo_latlon` USING BTREE (`status`,`lat`,`lon`), FULLTEXT KEY `idx_name` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=DYNAMIC;

My request:

 SELECT Listing.name, Listing.categoryid, Listing.lat, Listing.lon , 3956 * 2 * ASIN(SQRT( POWER(SIN((Listing.lat - 37.369195) * pi()/180 / 2), 2) + COS(Listing.lat * pi()/180) * COS(37.369195 * pi()/180) * POWER(SIN((Listing.lon --122.036849) * pi()/180 / 2), 2) )) rawgeosearchdistance FROM Listing WHERE Listing.status = '2' AND ( Listing.lon between -122.10913433498 and -121.96456366502 ) AND ( Listing.lat between 37.296909665016 and 37.441480334984) HAVING rawgeosearchdistance < 5 ORDER BY rawgeosearchdistance ASC;

Explain the plan without a geo-search:

  + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ------------- +
     |  id |  select_type |  table |  type |  possible_keys |  key |  key_len | ref |  rows |  Extra |
     + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ------------- +
     |  1 |  SIMPLE |  Listing |  range |  idx_geo_latlon |  idx_geo_latlon |  19 |  NULL |  453 |  Using where |
     + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ------------- +

Explain the plan using geo-search:

  + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ----------------------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ----------------------------- +
 |  1 |  SIMPLE |  Listing |  range |  idx_geo_latlon |  idx_geo_latlon |  19 |  NULL |  453 |  Using where;  Using filesort |
 + ---- + ------------- + ------------ + ------- + --------- -------- + ----------------- + --------- + ------ + ------ + ----------------------------- +

Here is an explanation plan with a coverage index. The presence of columns in the correct order is of great importance:

  + ---- + ------------- + -------- + ------- + ------------- - + --------------- + --------- + ------ + -------- + ----- ------------------------------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + -------- + ------- + ------------- - + --------------- + --------- + ------ + -------- + ----- ------------------------------------- +
 |  1 |  SIMPLE |  Listing |  range |  idx_geo_cover |  idx_geo_cover |  12 |  NULL |  453 |  Using where;  Using index;  Using filesort |
 + ---- + ------------- + -------- + ------- + ------------- - + --------------- + --------- + ------ + -------- + ----- ------------------------------------- +

Thanks!

+4

performance optimization mysql geolocation query-optimization

Jeff Jun 04 '09 at 17:19

source share

5 answers

I think you should consider using PostgreSQL (in conjunction with Postgis).

I abandoned MySQL for geospatial data (for now) due to the following reasons:

MySQL only supports spatial data types / spatial indices in MyISAM tables with its inherent MyISAM flaws (regarding transactions, referential integrity ...)
MySQL implements some of the OpenGIS specifications based solely on the MBR (minimum bounding box), which is pretty useless for the most serious processing of geospatial queries (see this link in the MySQL manual ). Most likely, you will need a part of this function earlier.

PostgreSQL / Postgis with valid (GIST) spatial indexes and valid queries can be extremely fast.

Example : determination of overlapping polygons between a "small" selection of polygons and a table with more than 5 million (!) Very complex polygons, calculating the amount of overlap between these results + sorting. Average lead time: 30 to 100 milliseconds (this particular machine has a lot of free space. Remember to configure your PostgreSQL installation ... (read the documents)).

+4

ChristopheD Jun 04 '09 at 17:56

source share

You really should avoid doing this math in your select statement. This is probably the source of a lot of slowdowns. Remember that SQL is a query language; it really is not optimized for trigonometric functions.

SQL will be faster and your overall results will be faster if you do a very naive distance search (which will return more results) and then output the results.

If you want to use the distance in your query, at least use the calculation of the square distance; Sqrt calculations are notoriously slow. The squared distance is much easier to use. The calculation of the square distance is simply the square of the distance, not the distance; it is much easier. For Cartesian coordinate systems, since the sum of the squares of the short sides of the right triangle is equal to the square of the hypotenuse, it is easier to calculate the square distance (just sum the two squares) than calculate the distance; all you have to do is make sure that you raise the distance you want to compare (so instead of finding the exact distance and comparing it to the desired distance (say 5), you find the square distance and compare it by square desired distance (25 if your desired distance was 5).

0

Paul sonier Jun 04 '09 at 17:23

source share

Depending on the number of your posts, you can create a view containing

Listing1Id, Listing2ID, Distance

Basically just all the distances are “pre-calculated”

Then you can do something like:

Select listing2ID from v_Distance d where distance is <5 and listing1ID = XXX

0

Puttzy Jun 04 '09 at 17:43

source share

When I searched by geo-radius, I simply loaded all the Zipcodes into memory with their long longs, and then used my starting point with the radius to get a list of zipcodes in the radius, and then used this for my db query. Of course, I used solr to search, because the search space was in the range of 20 million lines, but the same principles should apply. I apologize for the superficiality of this answer, since I am on my phone.

0

Hardwareguy Jun 04 '09 at 23:13

source share

jonstjohn · Accepted Answer · 2009-06-04T17:27:39+0000

You are probably using the "coverage index" in your query for lat / lon only. The coverage index occurs when the index used in the query contains the data that you select. MySQL should only visit the index, not data rows. See more details . This explains why the lat / lon request is so fast.

I suspect that the calculations and the net row count are being returned slows down a longer query. (plus any temporary table that must be created for the having clause).

Need help optimizing lat / Lon geoinformation for mysql

More articles: