Why are there many changes in MySQL performance (InnoDB)?

I began to explore why some searches in the Django admin are very slow (see here ). Digging further, I found that the performance of MySQL (5.1, InnoDB) varies greatly from one query to another. For instance:

This request (looking for 'c', 'd' and 'e' in 4 fields, 2 related) generated by Django, takes 89 ms and returns 3093 lines:

SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%e%' OR `donnees_artiste`.`prenom` LIKE '%e%' OR T7.`cote` LIKE '%e%' OR T7.`titre` LIKE '%e%' ) ); 

If I replaced "e" with "k", so basically the same request, it takes 8720 ms (an increase of 100 times) and returns 931 lines.

 SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%k%' OR `donnees_artiste`.`prenom` LIKE '%k%' OR T7.`cote` LIKE '%k%' OR T7.`titre` LIKE '%k%' ) ); 

Both of these queries give the same EXPLAIN , so there is no clue.

 ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA 1 SIMPLE donnees_artiste ALL None None None None 4368 Using temporary; Using filesort 1 SIMPLE donnees_artiste_evenements ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE donnees_evenement eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.donnees_artiste_evenements.evenement_id 1 Using where; Distinct 1 SIMPLE T4 ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE T5 eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.T4.evenement_id 1 Using where; Distinct 1 SIMPLE T6 ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE T7 eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.T6.evenement_id 1 Using where; Distinct 

Also, if I do a COUNT for the first request, it takes 11200 ms.

 SELECT COUNT(DISTINCT `donnees_artiste`.`id`) FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%e%' OR `donnees_artiste`.`prenom` LIKE '%e%' OR T7.`cote` LIKE '%e%' OR T7.`titre` LIKE '%e%' ) ); 

My innodb_buffer_pool_size set to high. I have indexes in all the corresponding fields and primary keys, and I have already optimized my tables.

So why is the first request running so fast and the other 2 so slow? These 3 queries are just examples. Many times, I simply change or delete one character from the request, and this has greatly affected the time of the request. But I do not see any picture.

UPDATE

The performance issue is definitely related to how Django generates these requests. All of these redundant LEFT OUTER JOIN chained together kill performance. At the moment, it’s not entirely clear to me if this is an error in the Django SQL generator, an error in building a query for the search field, or if everything works as expected by the Django developers. I am still investigating, but at least there is one strange thing in Django's behavior ...

If I run this query (this is not necessarily equivalent to the second, but not far), the results will be pretty fast (161 ms, no cache):

 SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR `donnees_evenement`.`cote` LIKE '%d%' OR `donnees_evenement`.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%k%' OR `donnees_artiste`.`prenom` LIKE '%k%' OR `donnees_evenement`.`cote` LIKE '%k%' OR `donnees_evenement`.`titre` LIKE '%k%' ) ); 

SECOND UPDATE

Finally, this is not a bug in Django, I am sure that this is the desired behavior. The idea is that when searching in several terms, the search for the next term is performed by returning the subset to the previous member, so for related fields all members should not be on the same line in order to have a match. To do this, you need to create a temporary table with each subset and scan it. This explains why there can be many variations, because if the first term matches only a few rows, the temporary table will be small and the search for the next term will be quick (because they will be performed on a small table). The difference between the two requests is subtle, but a Django request can return more matches in general.

+1
source share
3 answers

I think the answer is that in most cases e is at the beginning of the scanned lines and in the first line to be searched, which allows us to circulate OR conditions briefly, while matches for k occur in the last conditions and somewhere at the end of the line. And since there are significantly fewer lines with k , more lines need to be completely scanned without any coincidence.

+2
source

If you use the LIKE template with the main template, your query will not be useful to the index. Using LIKE in this way can be very inefficient, and its runtime can vary greatly. Why?

  • The algorithm underlying the LIKE statement stops the search for a string if it encounters a match.
  • In this scenario (without an index), MySQL uses some other boosting algorithms that may or may not be applicable in some cases.

Why does using COUNT in the third query slow it down?

I see that you are using innoDB.

innoDB does not read the number of rows from the stored / cached value, for example MyISAM (if the column is NOT NULL) leads to the fact that innoDB is more optimized for "writing" than "reading" (against MyISAM). Using COUNT in the innoDB table either performs a full table scan or a full index scan every time you do this.

Your queries do not use any index, and this may be the worst case, so a full table scan occurs (and yes, it is as slow as it seems).

I think you might be interested: MySQL indexes

+2
source

Type conditions:

 WHERE column LIKE '%c%' 

cannot use index on column . So these columns should be fully scanned.

You have several such conditions, you use OR between them (which ensures that all of these tables are scanned). And finally, you (rather: Django) added DISTINCT , which probably requires a final file before returning the results.

I can not find an explanation for the huge difference in performance (100x). Perhaps the first request was cached. Can you try adding ORDER NY NULL at the end of the queries and time them?

The generated query is also not well designed, because it is likely to end with a mini-Cartesian connection. You join a base table to several tables in a 1-to-many relationship with the base table. That the reason for the poor performance and the query plan would help clarify this.

+1
source

Source: https://habr.com/ru/post/1401010/


All Articles