Are there any flaws in marking the tracking table by date?

We have a mysql tracking database that has ~ 100M rows. We usually fulfill requests every day for specific activities, unique visits, etc. The problem is that the queries generate a monthly report more slowly due to the interaction of the indices (we have a range scan for the date, then the queries are based on several other fields.)

To improve performance, we switched to date-based joins to avoid range scanning, and performance is much better. Thus, the idea was presented that perhaps we just need to outline the day with a different table for each day. Benefits:

  • quick inserts - every day the table is new / small, therefore it is always fast.
  • Deleting old data is simple (instead of deleting 5M rows from a 100M row table, we can just delete the table)
  • Our current approach makes joins anyway - so we just join different tables instead of different values ​​from the same table.

Has anyone heard or tried this approach? Are there any predictable problems?

Note We are considering other noSQL approaches, but we would like to know if this is a valid approach if we decide to stay with MySQL (so please do not suggest "try XYZ noSQL DB"). In addition, I understand that we can simply get a much better machine and that in the field of data sets it is not so much, but we do not want to spend money on a larger machine if a smaller one works without a lot of additional work.

+5
source share
1 answer

It seems you can take a look at MySQL Partitioning .

Partitioning allows you to distribute parts of individual tables in the file system according to the rules, which you can set mainly as needed. In fact, different parts of the table are stored as separate tables in different places. The user-selected rule used to split data is known as a partition function, which in MySQL can be a module, a simple mapping to a set of ranges or lists of values, an internal hash function, or a linear hash function. The function is selected according to the type of section specified by the user, and takes as a parameter the value of the expression provided by the user. This expression can be a column value, a function that acts on one or more column values, or a collection of one or more column values, depending on the type of partition used.
It seems in your case the hash section for the monthly part of the date might be useful.

CREATE TABLE Mydata (id INT, amount DECIMAL(7,2), tr_date DATE) ENGINE=INNODB PARTITION BY HASH( MONTH(tr_date) ) PARTITIONS 12; 

The split solution will be conceptually the same as yours, but RDBMS will handle many aspects for you.

+2
source

Source: https://habr.com/ru/post/1201999/


All Articles