What should a large MySQL table look like before splitting it into multiple tables?

Problem: we have a very large table and is growing. Most of its records (for example, 80%) are historical data (with a "DATE" field for the past date), which are rarely requested, and a small part (for example, 20%) is current data (the "DATE" field after the current date) most queries search for these current records.

Consider two possible scenarios that will be better (given the overall complexity and performance of the implementation) ...

  • Breaking a large table into two tables: historical and current data. And every day I transfer records with an expired date from the current table to the "Historical" table.

  • Saving a record in one table (the DATA field is defined as INDEXED).

Scenario A will indicate more fuss during implementation and maintenance and overload on a daily basis to move dates between tables, while scenario B will indicate a search for a large database (albeit indexed). Does this cause memory problems? Which scenario is recommended? Are there any other recommendations?

+5
source share
3 answers

Usually you do not want to split a large table into several tables, although having a current and historical table is quite reasonable. Your process makes sense. Then you can optimize the current table for queries. I would probably go to two tables (given the limited information you provide), as it allows such optimization.

However, do not share historical data. Use split instead. See the documentation. One caveat: in the where section, queries must specify a partition key in order to take advantage of the sections. With a large table, this is typical anyway.

+4
source

Question: are the historical data necessary for the functionality of the system, or are these records stored for other purposes (for example, audits)? Perhaps it's time to clean the house by moving the historical data to the archive.

+2
source

In my experience, most big data systems have historical tables. In most cases, what I was, and current data and historical data have different user groups. Current data is used by end users to work with clients with their current or recent transactions. Historical data is usually used by groups of users who do not need to talk directly with clients / clients.

Do not worry about the issue of implementation and maintenance, as I think that your main focus is on performance. Implementation is only a one-time transaction that will work at a certain frequency (for example, weekly, monthly or annually) after you transfer the program / s to production. The service is very small and you can just forget about it as soon as it is already implemented. You just need to make sure that you thoroughly test the programs.

For normalized historical tables, tables have the same structure and field names, which greatly simplifies data copying. Thus, you can simply connect the table between the tables.

If you decide not to split the data, you will continue to add the index after the index. But somewhere along the way, you will still encounter the same problem again.

+2
source

Source: https://habr.com/ru/post/1233421/


All Articles