GeoIP table connects to IP table in MySQL

I had a problem finding quick access to tables that look like this:

mysql> explain geo_ip; +--------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+------------------+------+-----+---------+-------+ | ip_start | varchar(32) | NO | | "" | | | ip_end | varchar(32) | NO | | "" | | | ip_num_start | int(64) unsigned | NO | PRI | 0 | | | ip_num_end | int(64) unsigned | NO | | 0 | | | country_code | varchar(3) | NO | | "" | | | country_name | varchar(64) | NO | | "" | | | ip_poly | geometry | NO | MUL | NULL | | +--------------+------------------+------+-----+---------+-------+ mysql> explain entity_ip; +------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------------+------+-----+---------+-------+ | entity_id | int(64) unsigned | NO | PRI | NULL | | | ip_1 | tinyint(3) unsigned | NO | | NULL | | | ip_2 | tinyint(3) unsigned | NO | | NULL | | | ip_3 | tinyint(3) unsigned | NO | | NULL | | | ip_4 | tinyint(3) unsigned | NO | | NULL | | | ip_num | int(64) unsigned | NO | | 0 | | | ip_poly | geometry | NO | MUL | NULL | | +------------+---------------------+------+-----+---------+-------+ 

Please note that I am not interested in finding the necessary lines in geo_ip only one IP address at a time, I need entity_ip LEFT JOIN geo_ip (or similar / analogue method).

This is what I have now (using polygons, as stated in http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip- and-mysql-gis / ):

 mysql> EXPLAIN SELECT li.*, gi.country_code FROM entity_ip AS li -> LEFT JOIN geo_ip AS gi ON -> MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`); +----+-------------+-------+------+---------------+------+---------+------+--------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+--------+-------+ | 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | | | 1 | SIMPLE | gi | ALL | ip_poly_index | NULL | NULL | NULL | 155183 | | +----+-------------+-------+------+---------------+------+---------+------+--------+-------+ mysql> SELECT li.*, gi.country_code FROM entity AS li LEFT JOIN geo_ip AS gi ON MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`) limit 0, 20; 20 rows in set (2.22 sec) 

No polygons

 mysql> explain SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.`ip_num` >= gi.`ip_num_start` AND li.`ip_num` <= gi.`ip_num_end` LIMIT 0,20; +----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+ | 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | | | 1 | SIMPLE | gi | ALL | PRIMARY,geo_ip,geo_ip_end | NULL | NULL | NULL | 155183 | | +----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+ mysql> SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.ip_num BETWEEN gi.ip_num_start AND gi.ip_num_end limit 0, 20; 20 rows in set (2.00 sec) 

(There is no difference for more lines in the search)

Currently, I cannot get better performance from these requests, since 0.1 seconds on IP is too slow for me.

Is there any way to do this faster?

+6
source share
4 answers

This approach has some scalability problems (if you decide to switch to, say, city-specific geo-information), but for a given data size, it will provide significant optimization.

The problem you are facing is that MySQL does not really optimize range-based queries. Ideally, you want to do an exact ("=") search by index, not "more than", so we will need to create such an index from the data you have. Thus, MySQL will have much fewer rows to evaluate when matching.

To do this, I suggest creating a lookup table that indexes the geolocation table based on the first octet (= 1 of 1.2.3.4) of the IP addresses. The idea is that for every search you have to do, you can ignore all geolocation IP addresses that don't start with the same octet as the IP address you are looking for.

 CREATE TABLE `ip_geolocation_lookup` ( `first_octet` int(10) unsigned NOT NULL DEFAULT '0', `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0', `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0', KEY `first_octet` (`first_octet`,`ip_numeric_start`,`ip_numeric_end`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

Then we need to take the data available in your geolocation table and create data that covers all the (first) octets that the geolocation line covers: if you have an entry with ip_start = '5.3.0.0' and ip_end = '8.16.0.0' , the lookup table will need rows for octets 5, 6, 7, and 8. So ...

 ip_geolocation |ip_start |ip_end |ip_numeric_start|ip_numeric_end| |72.255.119.248 |74.3.127.255 |1224701944 |1241743359 | 

Must be converted to:

 ip_geolocation_lookup |first_octet|ip_numeric_start|ip_numeric_end| |72 |1224701944 |1241743359 | |73 |1224701944 |1241743359 | |74 |1224701944 |1241743359 | 

Since someone here requested their own MySQL solution, a procedure is stored here that will generate this data for you:

 DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup; CREATE PROCEDURE recalculate_ip_geolocation_lookup() BEGIN DECLARE i INT DEFAULT 0; DELETE FROM ip_geolocation_lookup; WHILE i < 256 DO INSERT INTO ip_geolocation_lookup (first_octet, ip_numeric_start, ip_numeric_end) SELECT i, ip_numeric_start, ip_numeric_end FROM ip_geolocation WHERE ( ip_numeric_start & 0xFF000000 ) >> 24 <= i AND ( ip_numeric_end & 0xFF000000 ) >> 24 >= i; SET i = i + 1; END WHILE; END; 

And then you will need to populate the table by calling this stored procedure:

 CALL recalculate_ip_geolocation_lookup(); 

At this point, you can delete the procedure you just created - it is no longer needed if you do not want to recount the lookup table.

Once the lookup table is in place, all you have to do is integrate it into your queries and make sure you query for the first octet. Your query to the lookup table will satisfy two conditions:

  • Find all strings matching the first octet of your IP address
  • From this subset: find a string that has a range that matches your IP address.

Since the second step is performed on a subset of the data, it is significantly faster than performing range tests on all the data. This is the key to this optimization strategy.

There are various ways to find out what the first octet of an IP address is; I used ( r.ip_numeric & 0xFF000000 ) >> 24 , since my source IP addresses are in numerical form:

 SELECT r.*, g.country_code FROM ip_geolocation g, ip_geolocation_lookup l, ip_random r WHERE l.first_octet = ( r.ip_numeric & 0xFF000000 ) >> 24 AND l.ip_numeric_start <= r.ip_numeric AND l.ip_numeric_end >= r.ip_numeric AND g.ip_numeric_start = l.ip_numeric_start; 

Now, admittedly, in the end I will get a little: you can easily get rid of the ip_geolocation table if you made the ip_geolocation_lookup table also containing country data. I assume that deleting one table from this query will make it a little faster.

And finally, here are two other tables that I used in this answer for reference, as they are different from your tables. I am sure they are compatible.

 # This table contains the original geolocation data CREATE TABLE `ip_geolocation` ( `ip_start` varchar(16) NOT NULL DEFAULT '', `ip_end` varchar(16) NOT NULL DEFAULT '', `ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0', `ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0', `country_code` varchar(3) NOT NULL DEFAULT '', `country_name` varchar(64) NOT NULL DEFAULT '', PRIMARY KEY (`ip_numeric_start`), KEY `country_code` (`country_code`), KEY `ip_start` (`ip_start`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; # This table simply holds random IP data that can be used for testing CREATE TABLE `ip_random` ( `ip` varchar(16) NOT NULL DEFAULT '', `ip_numeric` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`ip`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 
+6
source

I can not comment, but user1281376 answers are erroneous and do not work. the reason you use only the first octet is because otherwise you will not match all ip ranges. there are many ranges that span the second few octets that user request 1281376 has changed will not match. And yes, this happens if you use Maxmind GeoIp data.

with the Aleksis clause, you can make a simple comparison on the octet fรฎrst, thereby reducing the corresponding set.

+1
source

Just wanted to get the community back:

Here's an even better and optimized way to build Alexi's solution:

 DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup; DELIMITER ;; CREATE PROCEDURE recalculate_ip_geolocation_lookup() BEGIN DECLARE i INT DEFAULT 0; DROP TABLE `ip_geolocation_lookup`; CREATE TABLE `ip_geolocation_lookup` ( `first_octet` smallint(5) unsigned NOT NULL DEFAULT '0', `startIpNum` int(10) unsigned NOT NULL DEFAULT '0', `endIpNum` int(10) unsigned NOT NULL DEFAULT '0', `locId` int(11) NOT NULL, PRIMARY KEY (`first_octet`,`startIpNum`,`endIpNum`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT IGNORE INTO ip_geolocation_lookup SELECT startIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId FROM ip_geolocation; INSERT IGNORE INTO ip_geolocation_lookup SELECT endIpNum DIV 1048576 as first_octet, startIpNum, endIpNum, locId FROM ip_geolocation; WHILE i < 1048576 DO INSERT IGNORE INTO ip_geolocation_lookup SELECT i, startIpNum, endIpNum, locId FROM ip_geolocation_lookup WHERE first_octet = i-1 AND endIpNum DIV 1048576 > i; SET i = i + 1; END WHILE; END;; DELIMITER ; CALL recalculate_ip_geolocation_lookup(); 

It works faster than its solution, and speeds up the work, because we do not just take the first 8, but the first 20 bits. Connecting performance: 100,000 rows in 158 ms. You may need to rename the table and field names to your version.

Request with

 SELECT ip, kl.* FROM random_ips ki JOIN `ip_geolocation_lookup` kb ON (ki.`ip` DIV 1048576 = kb.`first_octet` AND ki.`ip` >= kb.`startIpNum` AND ki.`ip` <= kb.`endIpNum`) JOIN ip_maxmind_locations kl ON kb.`locId` = kl.`locId`; 
0
source

I found an easy way. I noticed that all the first ip in the group% 256 = 0, so we can add the ip_index table

 CREATE TABLE `t_map_geo_range` ( `_ip` int(10) unsigned NOT NULL, `_ipStart` int(10) unsigned NOT NULL, PRIMARY KEY (`_ip`) ) ENGINE=MyISAM 

How to populate an index table

 FOR_EACH(Every row of ip_geo) { FOR(Every ip FROM ipGroupStart/256 to ipGroupEnd/256) { INSERT INTO ip_geo_index(ip, ipGroupStart); } } 

How to use:

 SELECT * FROM YOUR_TABLE AS A LEFT JOIN ip_geo_index AS B ON B._ip = A._ip DIV 256 LEFT JOIN ip_geo AS C ON C.ipStart = B.ipStart; 

More than 1000 times faster.

0
source

Source: https://habr.com/ru/post/901895/


All Articles