How to optimize a large table in MySQL when I can benefit from a section?

Thus, separating date ranges and memory configuration accomplished my goal.

I needed to increase the memory allocated by innodb_buffer_pool_size since the default value of 8M was too low. Rick James recommends 70% of the RAM for this parameter, he has a lot of great information.

Edlerd was right with both suggestions :-)

I split my data into monthly sections and then completed a 6,000 line response request, which initially took 6 to 12 seconds. Now it completes in less than a second (.984 / .031). I ran this using the default indos buffer size (innodb_buffer_pool_size = 8M) to make sure this is not just an increase in memory.

Then I set innodb_buffer_pool_size = 4G and fulfilled the request with an even better answer .062 / .032.

I'd also like to mention that the increase in memory also improved the overall speed of my web application and the services that receive and write messages to this table, I am amazed at how much this value changed in this setting. Time to first byte (TTFB) from my web server now almost matches MySQL Workbench, from time to time reaching 20 seconds.

I also found that the slow query log file was a great tool to identify problems, it was there that I saw that it showed that my innodb_buffer_pool_size was low and highlighted all the poor executable requests. It also identified areas in which I needed to index other tables.

EDIT 2016-11-12 SOLUTION

I am engaged in the reorganization of a large table that records telemetry data, it works for about 4-5 months and is generated by approx. 54 million records with an average row size of approx. 380 bytes.

I began to observe some performance lag from one of my raw data requests, which returns all the logs for the device over a 24-hour period.

Initially, I thought it was indexing, but I think that the amount of I / O should be handled by MySQL. A typical 24-hour request will contain 2.2k 3k to 9k records, and Id actually supports export for about 7 days.

I have no experience in tuning database performance, so I'm just learning the ropes. I am considering several strategies.

  • Convert the indices according to the raw data request, although I think my indices are ok, as the explanation plan shows a 100% rating.
  • Consider creating a coverage index to include all necessary rows
  • Implement split date breakdown: a) Keep monthly sections. For instance. last 6 months b) Move something older to the archive table.
  • Create a separate table (vertical partition) with the raw data and join it with the identifiers of the primary query table. Not sure if this is my problem as my indexes work.
  • Modify my queries to pull the data in batches with constraints, then sort with the constraint on date X and continue until more records are returned.
  • View server configuration

1.2 (INDICES): I am redoing my indexes with my queries, but I think I'm good here when Explain shows a 100% hit if I don't read it wrong.

I try to use the coverage index when they rebuild, but how can I tell if you are knocking on the effects of a bad setup? EG. insertion speeds are compromised.

What is the best way to control the performance of my table in a live environment?

EDIT: I just started using a slow log file , which looks like a good tool to find problems, and I suppose that requesting performance_schema might be another option?

3 (SEPARATION): I read the sections a bit and am not sure if the size of my data will make a big difference.

Rick James offers > 1M records, Im on 54M and would like to keep about 300M before archiving, is my table complex enough to be useful?

I have to verify this myself, because I have no experience with this material, and all this is theoretical for me. I just do not want to go this way if it is not suitable for my needs.

4 (Vertical splitting on the "joined detailed table"): I do not think that I have problems checking the tables and I need all the rows, so I'm not sure if this method will be useful.

5 (Use constraints and selections again):. Will this free the server if I use less time in one request? Will I get better I / O throughput with more commands in one connection?

6 (Review Config):. The other part will be viewing the non-developer default configuration that is used when installing MySQL, maybe there are some settings that can be configured ?:-)

Thank you for reading, really wanted to hear any suggestions.

Next FYI:

Table:

CREATE TABLE `message_log` ( `db_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `db_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `created` datetime DEFAULT NULL, `device_id` int(10) unsigned NOT NULL, `display_name` varchar(50) DEFAULT NULL, `ignition` binary(1) DEFAULT NULL COMMENT 'This is actually IO8 from the falcom device', `sensor_a` float DEFAULT NULL, `sensor_b` float DEFAULT NULL, `lat` double DEFAULT NULL COMMENT 'default GPRMC format ddmm.mmmm \n', `lon` double DEFAULT NULL COMMENT 'default GPRMC longitude format dddmm.mmmm ', `heading` float DEFAULT NULL, `speed` float DEFAULT NULL, `pos_validity` char(1) DEFAULT NULL, `device_temp` float DEFAULT NULL, `device_volts` float DEFAULT NULL, `satellites` smallint(6) DEFAULT NULL, /* TINYINT will suffice */ `navdist` double DEFAULT NULL, `navdist2` double DEFAULT NULL, `IO0` binary(1) DEFAULT NULL COMMENT 'Duress', `IO1` binary(1) DEFAULT NULL COMMENT 'Fridge On/Off', `IO2` binary(1) DEFAULT NULL COMMENT 'Not mapped', `msg_name` varchar(20) DEFAULT NULL, /* Will be removed */ `msg_type` varchar(16) DEFAULT NULL, /* Will be removed */ `msg_id` smallint(6) DEFAULT NULL, `raw` text, /* Not needed in primary query, considering adding to single table mapped to this ID or a UUID correlation ID to save on @ROWID query */ PRIMARY KEY (`db_id`), KEY `Name` (`display_name`), KEY `Created` (`created`), KEY `DeviceID_AND_Created` (`device_id`,`created`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

DeviceID_AND_Created is the main index. I need a PK clustered index because I use the record identifier in the pivot table that tracks the last message for this device. Will be created will be the column of the section, so I think it will also be added to the PK cluster?

QUERY

 SELECT ml.db_id, ml.db_created, ml.created, ml.device_id, ml.display_name, bin(ml.ignition) as `ignition`, bin(ml.IO0) as `duress`, bin(ml.IO1) as `fridge`,ml.sensor_a, ml.sensor_b, ml.lat, ml.lon, ml.heading, ml.speed,ml.pos_validity, ml.satellites, ml.navdist2, ml.navdist,ml.device_temp, ml.device_volts,ml.msg_id FROM message_log ml WHERE ml.device_id = @IMEI AND ml.created BETWEEN @STARTDATE AND DATE_ADD(@STARTDATE,INTERVAL 24 hour) ORDER BY ml.db_id; 

This returns all logs for a given 24-hour period, which currently is approx. From 3k to 9k lines, the average line size is 381 bytes and will be reduced after deleting one of the TEXT (raw) fields

+5
source share
3 answers

Implementation of split date breakdown: a) Saving monthly sections. For instance. last 6 months. b) Move something older to the archive table.

This is a very good idea. I believe that all entries will be in a new section, and you will only request the latest data. You always need a situation where your data and index fit into memory. Thus, the disk does not read.

Depending on your use case, it may be wise to have one section per week. Then you need to store a maximum of two weeks of data in memory to read the last 7 days.

You can also adjust your buffer sizes (i.e. innodb_buffer_pool_size) if you use innodb as an engine or myisam_key_cache when using the myisam mechanism.

Also, adding ram to the database machine usually helps, since os can have data files in memory.

If you have a heavy recording, you can also configure other parameters (that is, how often records are saved to disk using innodb_log_buffer_size). This is to ensure that dirty pages are in memory longer, so as not to write them back to disk too often.

+2
source

For those who are interested, the following is what I used to create my partition and configure the memory.

Create Partitions

  • Updated PK to include the range column used in the section

     ALTER TABLE message_log CHANGE COLUMN created DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, DROP PRIMARY KEY, ADD PRIMARY KEY (db_id, created); 
  • Sections have been added using ALTER TABLE.

In retrospect, I had to create each section as one ALTER statement and use the Reorganize Partition (and here ) on the subsequent sections, since this consumes a lot of resources and time in one hit.

 ALTER TABLE message_log PARTITION BY RANGE(to_days(created)) ( partition invalid VALUES LESS THAN (0), partition from201607 VALUES LESS THAN (to_days('2016-08-01')), partition from201608 VALUES LESS THAN (to_days('2016-09-01')), partition from201609 VALUES LESS THAN (to_days('2016-10-01')), partition from201610 VALUES LESS THAN (to_days('2016-11-01')), partition from201611 VALUES LESS THAN (to_days('2016-12-01')), partition from201612 VALUES LESS THAN (to_days('2017-01-01')), partition from201701 VALUES LESS THAN (to_days('2017-02-01')), partition from201702 VALUES LESS THAN (to_days('2017-03-01')), partition from201703 VALUES LESS THAN (to_days('2017-04-01')), partition from201704 VALUES LESS THAN (to_days('2017-05-01')), partition future values less than (MAXVALUE) ); 

NOTE: I am not sure that using to_days () or a raw column is of great importance, but I saw that it is used in most examples, so I took it as the intended best practice.

Setting the size of the buffer pool

To change the value of innodb_db_buffer_pool_size , you can find the information: MySQL InnoDB Buffer Pool Resize and Rick Jame Page in memory

You can also do this in MySQL Workbench in the options menu and then innoDB tab. Any changes you make here will be recorded in the configuration file, but you will need to stop and start MySQL to read the configuration, otherwise you can also set a global value to do this live.

+1
source

Such a deal! I get 4 mentions without even mentioning comments or responses. I am writing an answer because I may have some improvements ...

Yes, PARTITION BY RANGE(TO_DAYS(...)) is the right way. (There may be a small number of alternatives.)

70% 4 GB of RAM. Make sure there is no exchange.

You mentioned one request. If this is the main problem, then it will be a little better:

 PRIMARY KEY(device_id, created, db_id), -- desired rows will be clustered INDEX(db_id) -- to keep AUTO_INCREMENT happy 

If you do not clear the old data, then the above sentence gives the same efficiency even without separation.

a lat / lon view says DOUBLE is redundant.

Beware of UUID inefficiencies , especially for huge tables.

+1
source

Source: https://habr.com/ru/post/1259534/


All Articles