Digression ... PARTITIONing
by date should be very useful for you, as you delete things more than a year ago. I would recommend PARTITION BY RANGE(TO_DAYS(...))
and PARTITION BY RANGE(TO_DAYS(...))
it into 14 or 54 sections (months or weeks, as well as some overhead). This will eliminate the time required to delete old lines, as the DROP PARTITION
almost instantaneous.
See my blog sections for more details. Your situation sounds like Use case # 1 and Use case # 3.
But back to your smart idea of resetting and restoring indexes. For others, I point out a caution that you have the luxury of not touching the table for a long time to make a makeover.
With PARTITIONing
all inserted rows go to the "last" section, right? This section is much smaller than the whole table, so the likelihood that the indexes fit in RAM will be 10 times faster than updating (without restoring the indexes). If you provided SHOW CREATE TABLE
, SHOW TABLE STATUS
, innodb_buffer_pool_size
and RAM size, I can help you do arithmetic to see if your last partition will fit in RAM.
A note about index updates in InnoDB - they are "delayed" sitting in the "Change buffer", which is part of buffer_pool. See innodb_change_buffer_size_max
, available from 5.6. Are you using this version or newer? (If not, you should upgrade for many reasons.)
The default value for this parameter is 25, which means that 25% buffer_pool is reserved for pending index updates caused by INSERT
, etc. This acts like a "cache", so that multiple updates for the same index block are held there until they stumble. A higher setting should make index updates less likely to hit the disk, and therefore complete faster.
Where am I heading with this ... By increasing this parameter, you make the insertion (direct, not tunable) more efficient. I think this can speed it up:
Shortly before nightly INSERTs
:
innodb_change_buffer_size_max = 70 innodb_old_blocks_pct = 10
Shortly after nightly INSERTs
:
innodb_change_buffer_size_max = 25 innodb_old_blocks_pct = 37
(I'm not sure about the other settings, but it seems reasonable to push it away.)
Meanwhile, what is the setting of innodb_buffer_pool_size
? Typically, this should be 70% of the available RAM.
In a similar application, I had large, hourly, dumps for loading into a table and a 90-day hold. I stretched my section rules with 90 daily sections and 24 hour sections. Every night I spent a lot of time (but less than an hour) doing the REORGANIZE PARTITION
, so that the 24-hour partitions were in the new daily (and resetting the 90-day section). For every hour, the load had the additional advantage that nothing else touched the 1-hour partition - I could normalize, generalize and load everything in 7 minutes. All 90 days fit in 400 GB. (Lateral note: a large number of partitions is a killer of performance up to 8.0, so do not even consider daily partitions for you for 1 year of storage.)
Pivot tables are made so that 50-minute queries (in the prototype) are reduced to 2 seconds. Perhaps you need a pivot table with PRIMARY KEY (a, b, date)
? This will allow you to get rid of such an index in the Fact table. Unfortunately, this removes the entire premise of your original question! See the links at the bottom of my blogs; Search for PivotTables. General rule: no indexes (except PRIMARY KEY
) in the fact table; use pivot tables for things that require more dirty indexes.