How to create an appropriate database for a traffic analysis system?

How to create the right structure for analytics service? Currently, I have 1 table that stores data about all users who visit the page with my client ID, so that later my clients can see statistics for a certain date.

Today I thought a little, and I’m curious: let them say that I have 1000 users, and each of them receives about 1000 hits on their sites every day, so every day I get 1,000,000 new entries in one table. How will it work in 2 months or so (when the table reaches 60 million records)?

I just think that after a while it will have so many records that the PHP requests to pull out the data will be really heavy, slow and take a lot of resources, right? and how to prevent this?

My friend, working on something similar, and he is going to create a new table for each client, is this the right way?

Thanks!

+6
source share
4 answers

The problem you are facing is related to the I / O system. 1 million records per day is approximately 12 queries per second. This is achievable, but pulling out data while recording at the same time will make your system tied to the level of the hard drive.

What you need to do is configure your database to support the amount of I / O that you will do, for example: use the appropriate database engine (InnoDB, not MyISAM), make sure you have a fast enough HDD subsystem (RAID, not ordinary disks, because they may not work at some point), optimally design your database, check queries with EXPLAIN to see where you may have gone wrong with them, maybe even use a different mechanism storage - personally, I would use TokuDB if I would be you.

In addition, I sincerely hope that you will fulfill your queries, sorting, filtering on the database side, and not on the PHP side.

+2
source

View this link to the Google Analytics Platform Component Overview page and pay particular attention to how data is written to the database, simply based on the architecture of the entire system.

Instead of immediately writing everything to your database, you can write everything to a log file and then process the log later (perhaps at a time when traffic is not so high). In the end, you will still need to make all these records in your database, but if you combine them and make them when such a load is more tolerable, your system will scale much better.

+1
source

You can normalize data displays as follows:

 Client Table { ID Name } Pages Table { ID Page_Name } PagesClientsVisits Table { ID Client_ID Page_ID Visits } 

and just increase the number of visits in the final table for each new impression. Then the number of entries in it becomes (number of clients * number of pages)

-1
source

Having a table with 60 million records might be fine. This is what the database is for. But you have to be careful how many fields you have in the table. Also, what data type (=> size) each field has.

You create some reports on the data. Think about what data you really need for these reports. For example, you may only need the number of visits for each user on each page. A simple score will do the trick.

What you can also do is generate a report every night and then delete the raw data.

So read and think about it.

-1
source

Source: https://habr.com/ru/post/904152/


All Articles