MySQL Query Performance - Query / Schema / Indexes?

Basically, there are performance issues with queries, mainly to my largest table, which stores call data.

The main request contains quite a few left joins and subsamples, but in the scenario where I run the request, where I expect 1.3M callbacks to be returned, the request just does not. It is worth stopping him for 7 minutes, which means that there is definitely a problem.

I narrowed down the main request and tested the simplest sub-net connection, which

SELECT DateStart, ID, NumbID, EffectiveFlag, OrigNumber FROM calls WHERE DateStart <= '2013-12-31' AND DateStart >= '2013-01-01' AND CallLength >= '00:00:00' AND Direction = '1' AND CustID IN (474,482,250,268,197,604,132,359,279,441,118,448,152,133,380,162,249,679,226,259,2450,2408,2451,2453,2439,2454,2444,2445,2452) 

And even this query takes 4.5s - so when it is a subselect in a query with other compounds and subselected, I can imagine why the query as a whole is not applicable.

Explanation statement for the specified request

 +----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+ | 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber,idx_Direction | idx_CustID_DateStart | 7 | NULL | 1660009 | Using where | +----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+ 

Call table database schema

 +-------------------+-------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------------+-------------+------+-----+---------------------+----------------+ | ID | int(11) | NO | PRI | NULL | auto_increment | | CustID | int(11) | NO | MUL | 0 | | | CarrID | int(11) | NO | MUL | NULL | | | TariID | int(11) | NO | MUL | 0 | | | CarrierRef | varchar(30) | NO | MUL | | | | NumbID | int(11) | NO | MUL | 0 | | | VlviID | int(11) | NO | MUL | NULL | | | VcamID | int(11) | NO | MUL | NULL | | | SomeID | int(11) | NO | MUL | NULL | | | VlnsID | int(11) | NO | MUL | NULL | | | NGNumber | varchar(12) | NO | | | | | OrigNumber | varchar(16) | NO | MUL | NULL | | | CLIRestrictedFlag | int(2) | NO | | NULL | | | OrigLocality | varchar(11) | NO | MUL | | | | OrigAreaCode | varchar(11) | NO | MUL | | | | TermNumber | varchar(16) | NO | MUL | NULL | | | BatchNumber | varchar(10) | NO | MUL | | | | DateStart | date | NO | MUL | 0000-00-00 | | | DateClear | date | NO | | 0000-00-00 | | | TimeStart | time | NO | | 00:00:00 | | | TimeClear | time | NO | | 00:00:00 | | | CallLength | time | NO | | 00:00:00 | | | RingLength | time | NO | | 00:00:00 | | | EffectiveFlag | smallint(1) | NO | MUL | NULL | | | UnansweredFlag | smallint(1) | NO | MUL | NULL | | | EngagedFlag | smallint(1) | NO | | NULL | | | RecID | int(11) | NO | MUL | NULL | | | CreatedUserID | int(11) | NO | | 0 | | | CreatedDatetime | datetime | NO | MUL | 0000-00-00 00:00:00 | | | Direction | int(1) | NO | MUL | NULL | | +-------------------+-------------+------+-----+---------------------+----------------+ 

Call Table Indexes

 +-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ | calls | 0 | PRIMARY | 1 | ID | A | 23905312 | NULL | NULL | | BTREE | | | calls | 1 | idx_CustID | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | | | calls | 1 | idx_NumbID | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | | | calls | 1 | idx_OrigNumber | 1 | OrigNumber | A | 5976328 | NULL | NULL | | BTREE | | | calls | 1 | idx_OrigLocality | 1 | OrigLocality | A | 45019 | NULL | NULL | | BTREE | | | calls | 1 | idx_OrigAreaCode | 1 | OrigAreaCode | A | 846 | NULL | NULL | | BTREE | | | calls | 1 | idx_TermNumber | 1 | TermNumber | A | 232090 | NULL | NULL | | BTREE | | | calls | 1 | idx_DateStart | 1 | DateStart | A | 4596 | NULL | NULL | | BTREE | | | calls | 1 | idx_EffectiveFlag | 1 | EffectiveFlag | A | 2 | NULL | NULL | | BTREE | | | calls | 1 | idx_UnansweredFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | | | calls | 1 | idx_EngagedFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | | | calls | 1 | idx_TariID | 1 | TariID | A | 110 | NULL | NULL | | BTREE | | | calls | 1 | idx_CustID_DateStart | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | | | calls | 1 | idx_CustID_DateStart | 2 | DateStart | A | 919435 | NULL | NULL | | BTREE | | | calls | 1 | idx_NumbID_DateStart | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | | | calls | 1 | idx_NumbID_DateStart | 2 | DateStart | A | 5976328 | NULL | NULL | | BTREE | | | calls | 1 | idx_RecID | 1 | RecID | A | 288015 | NULL | NULL | | BTREE | | | calls | 1 | idx_CarrierRef | 1 | CarrierRef | A | 7968437 | NULL | NULL | | BTREE | | | calls | 1 | idx_CustID_CallTermNumber | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | | | calls | 1 | idx_CustID_CallTermNumber | 2 | TermNumber | A | 246446 | NULL | NULL | | BTREE | | | calls | 1 | idx_CreatedDatetime | 1 | CreatedDatetime | A | 771139 | NULL | NULL | | BTREE | | | calls | 1 | idx_Direction | 1 | Direction | A | 2 | NULL | NULL | | BTREE | | | calls | 1 | idx_VlviID | 1 | VlviID | A | 50539 | NULL | NULL | | BTREE | | | calls | 1 | idx_SomeID | 1 | SomeID | A | 30 | NULL | NULL | | BTREE | | | calls | 1 | idx_VcamID | 1 | VcamID | A | 64 | NULL | NULL | | BTREE | | | calls | 1 | idx_VlnsID | 1 | VlnsID | A | 191 | NULL | NULL | | BTREE | | | calls | 1 | idx_CarrID | 1 | CarrID | A | 4 | NULL | NULL | | BTREE | | | calls | 1 | idx_BatchNumber | 1 | BatchNumber | A | 271651 | NULL | NULL | | BTREE | | +-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+ 

Something that I understand can cause performance, these are indexes on low power columns. I know that columns like Direction, which has a power of 2, are actually probably worse with an index in terms of performance, but this alone should not make the statement so slow.

Regarding the power requirements for having a standing index, is there a common power factor compared to general table entries where the index increases performance and decreases performance?

I understand that no one can answer me, which will change the query time from 4.5 to 0.01, but any advice on the query itself, table layout, indexes or hardware would be very useful.

Update:

@Sebas "please repeat the request and explain the plan without the part: AND CallLength> = '00: 00: 00 'AND Direction =' 1 'please?"

 +----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+ | 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber | idx_CustID_DateStart | 7 | NULL | 724813 | Using where | +----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+ 
+6
source share
6 answers

Is your "DateStart" truncated datetime - date date only? If not, you can create one with a truncated value (by day or hour) and use the int datatype type, which will make the index much smaller for a faster query.


Or, another optimization method (Golden Rule # 1 doesn’t do this, # 2 don’t do it now).

If and only if your date and PC are synchronized sequentially, you can create an external Range of StartDate <=> ID (PK) index.

and using the picture below

 SELECT @start:=ID_START FROM ANOTHER_TABLE WHERE StartDate='2013-01-01' SELECT @end:=ID_END FROM ANOTHER_TABLE WHERE StartDate='2013-12-31' SELECT * FROM calls WHERE ID BETWEEN @start and @end AND CustId in (xxxxx) .... 

Using the template above, Mysql will know if only a table segment needs to be scanned.

+4
source

As Darhazer said, you have too many indexes, start by deleting all of them and create them again based on your needs.

For this particular query, create one INDEX with these fields in it:

 DateStart CallLength Direction CustID 

Change AND Direction = '1' to AND Direction = 1 (remove the quotes, you are comparing an integer, not a string)

And look what this does with your request time. If this is good, add a subquery, check it again with EXPLAIN, add the necessary indexes, etc.

+3
source

The best index your query should select is idx_CustID_DateStart . The IN statement prevents this. If the CustID list is from a table, I suggest JOIN it rather than listing it.

+3
source

I'm not sure if the original query, which takes more than 7 minutes, is written correctly when you are concerned about a subquery that takes 5 seconds (I hope it will not be executed for each row). But anyway, if you want to speed it up, you should read something about how indexes work. I would recommend this article to get you started.

Basically, you have conditions for 4 fields, and in two fields you have conditions for a range. If you read the article, you know that the index is effectively used until the first range condition is met. However, the rest of the data in the index can be used to scan indexes. Thus, you need to choose which condition is better to narrow down the result set: on DateStart or on CallLength .

In any case, you need a composite index starting with (CustID, Direction ... I feel the condition on DateStart is better. So I start with (CustID, Direction, DateStart, CallLength) and compare it with (CustID, Direction, DateStart) , because the last field may not give a sufficient increase in performance, but it will require memory resources.

Although I still think, you need to be sure that the rest of the request is spelled correctly, focusing on the subquery. Perhaps there would be a more efficient way to organize the request so that this optimization would be irrelevant.

+2
source

4.5s is not much for return lines at 1.6 m, I am sure that all this was spent on input-output operations. Then there was practically no space left for optimization. You are better off submitting your initial request to us, perhaps we can help you better.

What is% of the total 1.6 million? Indexes are good if they are used to return the smallest part of the data set, but since their mrr data access pattern is a random read , it is sometimes more efficient using fullscan on the table. Of course, this depends on how the data was added to the table and how the space was allocated on the disk.

You may also find it useful to track performance using the MySQL performance schema , see here for details.

+2
source

You have too many indexes. For example, you do not need a separate CustID index because it is the leftmost in the CustID, DateStart. You have 2 indexes on UnansweredFlag. And do you really need all these indexes? This not only slows down insertion / updating, but also slows down optimization and can trick the optimizer to choose a not-so-good index.

Now, for a specific request. You need to see which field or combination limits the query as much as possible (since now it scans 1.6M lines!) And forces it to use this index. Therefore, run SELECT COUNT (*) queries for each of the where clauses (direction, call length) with the specified DateStart (you always want to limit this based on). Perhaps you just need to add direction to the index.

Also, before MySQL 5.6, the subqueries in the WHERE clause are not optimized, so you may need to rewrite the entire query to use the connection instead of the subquery, rather than optimizing the specific query

+1
source

Source: https://habr.com/ru/post/956098/


All Articles