Summary of MySQL Detailed Records Matching IP Ranges - MySQL Jedi Knight Required

So, I have to rely on all the powers of the greatest mySQL minds that SO has to offer. I have to summarize the detailed entries based on the IP address in each entry. Here's the script:

In short, we have consortia that want to know: " Which schools in my consortium have watched which videos how many times ? In terms of SQL, this amounts to COUNTing detailed records grouped by the IP range it can fall into.

  • We have several university consortia - each of which has several different schools that are members.
  • Each school in the consortium uses different IP ranges to access the videos we serve at these schools.
  • IP ranges are set using wildcards, so each school indicates something like “100.200.35.x, 100.201.xx, 100.202.39.50, etc.”, with the average number of ranges in the school being 10 or 15.
  • The raw text log files for the summary are already in the database (one line for each journal entry) and have the actual IP address and access to the video file.
  • There are 100 million detailed entries, so I fully expect it to be a long, slow process that lasts for a considerable period of time.
  • There are PHP scripts that can explode wildcards in the individual IP addresses that are presented, but I'm afraid this will be the final answer and may take several weeks.

(For simplicity, I will only refer to the video files that have been accessed and RECORD the journal entries for it, but in fact all the details, such as start / stop / duration, etc., are and will ultimately be will be part of this decision.)

With Consortium, it writes something like this: (All table templates except log information are open for suggestions):

| id|consortium | | 10|Ivy League | | 20|California | 

And School / IP writes something like this:

 | id|school |consortium_id| | 101|Harvard |10 | | 102|Yale |10 | | 103|UCLA |20 | | 104|Berkeley |20 | | id|school_id|ip_range | | 1| 101 |100.200.xx | | 2| 101 |100.201.65.x | | 3| 101 |100.202.39.50 | | 4| 101 |100.202.39.51 | | 5| 101 |100.200.xx | | 6| 101 |100.201.65.x | | 7| 101 |100.202.39.50 | 

And details something like this:

 |session |ip_address |filename | |560554790925|100.202.390.500|history101.mp4 | |406417611526|43.22.90.5 |newsreel.mp4 | |650423700223|100.202.39.50 |history101.mp4 | |650423700223|100.202.50.12 |science101.mp4 | |513057324209|100.202.39.56 |history101.mp4 | 

I like to think that I am very comfortable with mySQL, but it stretches it, and I hope that there will be an exciting feature or set of steps that someone can offer.

+1
mysql
May 31 '12 at 16:25
source share
2 answers

With your existing data structure, you can perform string matching as follows (but this is not very efficient):

 SELECT schools.school, detail.filename, COUNT(*) FROM schools JOIN ipranges ON schools.id = ipranges.school_id JOIN detail ON detail.ip_address LIKE REPLACE(ipranges.ip_range, 'x', '%') WHERE schools.consortium_id = ? GROUP BY schools.school, detail.filename 

The best way would be to save your IP ranges as network address and prefix length:

 ALTER TABLE ipranges ADD COLUMN network INT UNSIGNED, ADD COLUMN prefix TINYINT; UPDATE ipranges SET network = INET_ATON(REPLACE(ip_range, 'x', 0)), prefix = 32 - 8*(CHAR_LENGTH(ip_range) - CHAR_LENGTH(REPLACE(ip_range,'x','')); ALTER TABLE ipranges DROP COLUMN ip_range; ALTER TABLE detail ADD COLUMN ip_address_new INT UNSIGNED; UPDATE detail SET ip_address_new = INET_ATON(ip_address); ALTER TABLE detail DROP COLUMN ip_address, CHANGE ip_address_new ip_address INT UNSIGNED; 

Then it will be just a case of some bit comparisons:

 SELECT schools.school, detail.filename, COUNT(*) FROM schools JOIN ipranges ON schools.id = ipranges.school_id JOIN detail ON detail.ip_address & ~((1 << 32 - ipranges.prefix) - 1) = ipranges.network WHERE schools.consortium_id = ? GROUP BY schools.school, detail.filename 
+2
May 31 '12 at 16:34
source share
 SELECT D.filename, S.school, COUNT(D.*) FROM detail_records AS D INNER JOIN ip_map AS I ON D.ip_address LIKE CONCAT(SUBSTRING(I.ip_range, 1, LOCATE('x', I.ip_range)-1), '%') INNER JOIN school AS S ON S.id = I.school_id INNER JOIN consortium AS C ON C.id = S.consortium_id WHERE S.consortium_id = <consortium identifier> GROUP BY D.filename, S.school 
0
May 31 '12 at 16:51
source share



All Articles