Save millions of URLs in a database for quick pattern matching

I am developing a web analytics system that should register link URLs, landing page URLs and search keywords for each visitor to a website. What I want to do with this collected data is to allow the end user to request data such as "Show me all the visitors who came from Bing.com looking for a phrase containing" red shoes "or" Show me all the visitors who landed on URL containing "campaign = twitter_ad", etc.

Since this system will be used on many large websites, the amount of data required for registration will grow really, very quickly. So my question is: a) what would be the best strategy for logging so that scaling the system does not become a pain; b) how to use this architecture to quickly query arbitrary queries? Is there a special way to store URLs so that their request is faster?

In addition to the MySQL database that I use, I am exploring (and discovering) other alternatives that are more suitable for this task.

+3
source share
3 answers

URL- ( ), . O (k), k - url ( ). .

, . - . URL- . , URL- 1000 . - , .

+2

SQL Server, URLS/TITLES , URL TITLE. URL/Title.and 10 , .

SQL-,

(checksum([URL],(0)))

(checksum([URL],(0)))

MySql.

-, URL- , . url/title PK, .

, USER_URL FK PK USER URL.

.

0

I would like the URI to have a data type in mysql. But since the oracle has this, and mysql is now an oracle, this may happen someday ...

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i160550

0
source

Source: https://habr.com/ru/post/1748681/


All Articles