So, I have to rely on all the powers of the greatest mySQL minds that SO has to offer. I have to summarize the detailed entries based on the IP address in each entry. Here's the script:
In short, we have consortia that want to know: " Which schools in my consortium have watched which videos how many times ? In terms of SQL, this amounts to COUNTing detailed records grouped by the IP range it can fall into.
- We have several university consortia - each of which has several different schools that are members.
- Each school in the consortium uses different IP ranges to access the videos we serve at these schools.
- IP ranges are set using wildcards, so each school indicates something like “100.200.35.x, 100.201.xx, 100.202.39.50, etc.”, with the average number of ranges in the school being 10 or 15.
- The raw text log files for the summary are already in the database (one line for each journal entry) and have the actual IP address and access to the video file.
- There are 100 million detailed entries, so I fully expect it to be a long, slow process that lasts for a considerable period of time.
- There are PHP scripts that can explode wildcards in the individual IP addresses that are presented, but I'm afraid this will be the final answer and may take several weeks.
(For simplicity, I will only refer to the video files that have been accessed and RECORD the journal entries for it, but in fact all the details, such as start / stop / duration, etc., are and will ultimately be will be part of this decision.)
With Consortium, it writes something like this: (All table templates except log information are open for suggestions):
| id|consortium | | 10|Ivy League | | 20|California |
And School / IP writes something like this:
| id|school |consortium_id| | 101|Harvard |10 | | 102|Yale |10 | | 103|UCLA |20 | | 104|Berkeley |20 | | id|school_id|ip_range | | 1| 101 |100.200.xx | | 2| 101 |100.201.65.x | | 3| 101 |100.202.39.50 | | 4| 101 |100.202.39.51 | | 5| 101 |100.200.xx | | 6| 101 |100.201.65.x | | 7| 101 |100.202.39.50 |
And details something like this:
|session |ip_address |filename | |560554790925|100.202.390.500|history101.mp4 | |406417611526|43.22.90.5 |newsreel.mp4 | |650423700223|100.202.39.50 |history101.mp4 | |650423700223|100.202.50.12 |science101.mp4 | |513057324209|100.202.39.56 |history101.mp4 |
I like to think that I am very comfortable with mySQL, but it stretches it, and I hope that there will be an exciting feature or set of steps that someone can offer.
mysql
GDP May 31 '12 at 16:25 2012-05-31 16:25
source share