I am developing a web analytics system that should register link URLs, landing page URLs and search keywords for each visitor to a website. What I want to do with this collected data is to allow the end user to request data such as "Show me all the visitors who came from Bing.com looking for a phrase containing" red shoes "or" Show me all the visitors who landed on URL containing "campaign = twitter_ad", etc.
Since this system will be used on many large websites, the amount of data required for registration will grow really, very quickly. So my question is: a) what would be the best strategy for logging so that scaling the system does not become a pain; b) how to use this architecture to quickly query arbitrary queries? Is there a special way to store URLs so that their request is faster?
In addition to the MySQL database that I use, I am exploring (and discovering) other alternatives that are more suitable for this task.
source
share