Is adding and dropping indices every day on huge tables a good practice?

I am creating a web application that is connected to a MySQL database. At the moment, I have two huge tables, each of which contains about 40 million rows, and every day they get new rows (every day adds ~ 500 000-1000 000 rows).

The process of adding new rows is performed at night, while no one can use the application, and the contents of new rows depend on the result of some basic SELECT queries in the current database. To get the result of these SELECT quickly enough, I use simple indexes (one column per index) for each column that appears at least once in the WHERE .

The fact is that during the day some completely different queries are launched for these tables, including some "WHERE clauses" ( SELECT * FROM t1 WHERE a = a1 AND b = b1 AND (date BETWEEN d1 AND d2) ). I found on the stack this very useful mini-cookbook that advises you which INDEX should be used depending on how the database is requested: http://mysql.rjweb.org/doc.php/index_cookbook_mysql They suggest using a composite index : in my example, the query above will give INDEX (a, b, date).

This really increased the speed of requests executed during the day (from 1 minute to 8 seconds, so I was really happy).

However, with these composite indexes, the required time to add new lines during the night completely exploded (it took more than one day to add daily content).

My question is: would it be okay to drop all indexes every night, add new content and set up backup indexes? Or would it be dangerous, since indexes should not have to be rebuilt every day, especially on such large tables? I know that such an operation will take about two hours (uncheck and recreate the INDEXES).

I am aware of the existence of ALTER TABLE table_name DISABLE KEYS; , but I use InnoDB, and I believe that it does not work to work with the InnoDB table.

Any senior tips are welcome! Thanks in advance.

+6
source share
2 answers

I believe that you answered your question: you need indexes during the day, but not at night. Given what you are describing, you should dump indexes for volume inserts at night and re-create them afterwards. Dropping indexes to load data is not unheard of and seems appropriate in your case.

I would ask how you insert new data. One way is to insert values ​​one line at a time. Another is to include values ​​in a temporary table (without index) and bulk insert:

 insert into bigtable( . . .) select . . . from smalltable; 

They have different performance characteristics. You may find that using one insert (if you are not already doing this) is fast enough for your purposes.

+2
source

Digression ... PARTITIONing by date should be very useful for you, as you delete things more than a year ago. I would recommend PARTITION BY RANGE(TO_DAYS(...)) and PARTITION BY RANGE(TO_DAYS(...)) it into 14 or 54 sections (months or weeks, as well as some overhead). This will eliminate the time required to delete old lines, as the DROP PARTITION almost instantaneous.

See my blog sections for more details. Your situation sounds like Use case # 1 and Use case # 3.

But back to your smart idea of ​​resetting and restoring indexes. For others, I point out a caution that you have the luxury of not touching the table for a long time to make a makeover.

With PARTITIONing all inserted rows go to the "last" section, right? This section is much smaller than the whole table, so the likelihood that the indexes fit in RAM will be 10 times faster than updating (without restoring the indexes). If you provided SHOW CREATE TABLE , SHOW TABLE STATUS , innodb_buffer_pool_size and RAM size, I can help you do arithmetic to see if your last partition will fit in RAM.

A note about index updates in InnoDB - they are "delayed" sitting in the "Change buffer", which is part of buffer_pool. See innodb_change_buffer_size_max , available from 5.6. Are you using this version or newer? (If not, you should upgrade for many reasons.)

The default value for this parameter is 25, which means that 25% buffer_pool is reserved for pending index updates caused by INSERT , etc. This acts like a "cache", so that multiple updates for the same index block are held there until they stumble. A higher setting should make index updates less likely to hit the disk, and therefore complete faster.

Where am I heading with this ... By increasing this parameter, you make the insertion (direct, not tunable) more efficient. I think this can speed it up:

Shortly before nightly INSERTs :

 innodb_change_buffer_size_max = 70 innodb_old_blocks_pct = 10 

Shortly after nightly INSERTs :

 innodb_change_buffer_size_max = 25 innodb_old_blocks_pct = 37 

(I'm not sure about the other settings, but it seems reasonable to push it away.)

Meanwhile, what is the setting of innodb_buffer_pool_size ? Typically, this should be 70% of the available RAM.

In a similar application, I had large, hourly, dumps for loading into a table and a 90-day hold. I stretched my section rules with 90 daily sections and 24 hour sections. Every night I spent a lot of time (but less than an hour) doing the REORGANIZE PARTITION , so that the 24-hour partitions were in the new daily (and resetting the 90-day section). For every hour, the load had the additional advantage that nothing else touched the 1-hour partition - I could normalize, generalize and load everything in 7 minutes. All 90 days fit in 400 GB. (Lateral note: a large number of partitions is a killer of performance up to 8.0, so do not even consider daily partitions for you for 1 year of storage.)

Pivot tables are made so that 50-minute queries (in the prototype) are reduced to 2 seconds. Perhaps you need a pivot table with PRIMARY KEY (a, b, date) ? This will allow you to get rid of such an index in the Fact table. Unfortunately, this removes the entire premise of your original question! See the links at the bottom of my blogs; Search for PivotTables. General rule: no indexes (except PRIMARY KEY ) in the fact table; use pivot tables for things that require more dirty indexes.

+2
source

Source: https://habr.com/ru/post/1012587/


All Articles