Optimal web analytics data warehouse

I would like to write my own solution for web analytics and activity tracking and use it as a feedback mechanism, for example. to search or suggest content.

If it were only for short-lived data, I would use some NOSQL engine with limited data storage. But ideally, I would like to keep a long history.

One nice aproach I've seen in the past was to use mysql for storage, one table per month, while the old tables are converted to the ARCHIVE MySQL format. To view archives and aggregated data, MySQL views were implemented.

My question is: How does something like Google Analytics store its data? In a structured database or something else. How would you recommend avoiding a long-term burst of memory while keeping the query capabilities flexible?

(I am not interested in recording speeds in the database, this will happen in asynchronous packets, and not in real time)

+4
source share
2 answers

Google uses its own Big Table implementation to store its data. If you are interested in big data solutions and the use of big data, you should take a look at this. For an open source implementation built from Google Big Table, check out Hbase / Hadoop. In a minute I will post some links.

Analytics done on this data type use display / reduction operations.

+2
source

I think Urchin originally used its own custom multidimensional database, but I'm not sure if Google Analytics still uses it. In any case, analytical systems often use Mondrian works the same way, but using a relational database for storage.

+1
source

Source: https://habr.com/ru/post/1397180/


All Articles