We have a mysql tracking database that has ~ 100M rows. We usually fulfill requests every day for specific activities, unique visits, etc. The problem is that the queries generate a monthly report more slowly due to the interaction of the indices (we have a range scan for the date, then the queries are based on several other fields.)
To improve performance, we switched to date-based joins to avoid range scanning, and performance is much better. Thus, the idea was presented that perhaps we just need to outline the day with a different table for each day. Benefits:
- quick inserts - every day the table is new / small, therefore it is always fast.
- Deleting old data is simple (instead of deleting 5M rows from a 100M row table, we can just delete the table)
- Our current approach makes joins anyway - so we just join different tables instead of different values ββfrom the same table.
Has anyone heard or tried this approach? Are there any predictable problems?
Note We are considering other noSQL approaches, but we would like to know if this is a valid approach if we decide to stay with MySQL (so please do not suggest "try XYZ noSQL DB"). In addition, I understand that we can simply get a much better machine and that in the field of data sets it is not so much, but we do not want to spend money on a larger machine if a smaller one works without a lot of additional work.
source share