I began to explore why some searches in the Django admin are very slow (see here ). Digging further, I found that the performance of MySQL (5.1, InnoDB) varies greatly from one query to another. For instance:
This request (looking for 'c', 'd' and 'e' in 4 fields, 2 related) generated by Django, takes 89 ms and returns 3093 lines:
SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%e%' OR `donnees_artiste`.`prenom` LIKE '%e%' OR T7.`cote` LIKE '%e%' OR T7.`titre` LIKE '%e%' ) );
If I replaced "e" with "k", so basically the same request, it takes 8720 ms (an increase of 100 times) and returns 931 lines.
SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%k%' OR `donnees_artiste`.`prenom` LIKE '%k%' OR T7.`cote` LIKE '%k%' OR T7.`titre` LIKE '%k%' ) );
Both of these queries give the same EXPLAIN , so there is no clue.
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS EXTRA 1 SIMPLE donnees_artiste ALL None None None None 4368 Using temporary; Using filesort 1 SIMPLE donnees_artiste_evenements ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE donnees_evenement eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.donnees_artiste_evenements.evenement_id 1 Using where; Distinct 1 SIMPLE T4 ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE T5 eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.T4.evenement_id 1 Using where; Distinct 1 SIMPLE T6 ref artiste_id,donnees_artiste_evenements_eb99df11 artiste_id 4 mmac.donnees_artiste.id 1 Using index; Distinct 1 SIMPLE T7 eq_ref PRIMARY,donnees_evenements_id_index PRIMARY 4 mmac.T6.evenement_id 1 Using where; Distinct
Also, if I do a COUNT for the first request, it takes 11200 ms.
SELECT COUNT(DISTINCT `donnees_artiste`.`id`) FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T4 ON (`donnees_artiste`.`id` = T4.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T5 ON (T4.`evenement_id` = T5.`id`) LEFT OUTER JOIN `donnees_artiste_evenements` T6 ON (`donnees_artiste`.`id` = T6.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` T7 ON (T6.`evenement_id` = T7.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR T5.`cote` LIKE '%d%' OR T5.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%e%' OR `donnees_artiste`.`prenom` LIKE '%e%' OR T7.`cote` LIKE '%e%' OR T7.`titre` LIKE '%e%' ) );
My innodb_buffer_pool_size set to high. I have indexes in all the corresponding fields and primary keys, and I have already optimized my tables.
So why is the first request running so fast and the other 2 so slow? These 3 queries are just examples. Many times, I simply change or delete one character from the request, and this has greatly affected the time of the request. But I do not see any picture.
UPDATE
The performance issue is definitely related to how Django generates these requests. All of these redundant LEFT OUTER JOIN chained together kill performance. At the moment, it’s not entirely clear to me if this is an error in the Django SQL generator, an error in building a query for the search field, or if everything works as expected by the Django developers. I am still investigating, but at least there is one strange thing in Django's behavior ...
If I run this query (this is not necessarily equivalent to the second, but not far), the results will be pretty fast (161 ms, no cache):
SELECT DISTINCT `donnees_artiste`.`id` FROM `donnees_artiste` LEFT OUTER JOIN `donnees_artiste_evenements` ON (`donnees_artiste`.`id` = `donnees_artiste_evenements`.`artiste_id`) LEFT OUTER JOIN `donnees_evenement` ON (`donnees_artiste_evenements`.`evenement_id` = `donnees_evenement`.`id`) WHERE ( (`donnees_artiste`.`nom` LIKE '%c%' OR `donnees_artiste`.`prenom` LIKE '%c%' OR `donnees_evenement`.`cote` LIKE '%c%' OR `donnees_evenement`.`titre` LIKE '%c%' ) AND (`donnees_artiste`.`nom` LIKE '%d%' OR `donnees_artiste`.`prenom` LIKE '%d%' OR `donnees_evenement`.`cote` LIKE '%d%' OR `donnees_evenement`.`titre` LIKE '%d%' ) AND (`donnees_artiste`.`nom` LIKE '%k%' OR `donnees_artiste`.`prenom` LIKE '%k%' OR `donnees_evenement`.`cote` LIKE '%k%' OR `donnees_evenement`.`titre` LIKE '%k%' ) );
SECOND UPDATE
Finally, this is not a bug in Django, I am sure that this is the desired behavior. The idea is that when searching in several terms, the search for the next term is performed by returning the subset to the previous member, so for related fields all members should not be on the same line in order to have a match. To do this, you need to create a temporary table with each subset and scan it. This explains why there can be many variations, because if the first term matches only a few rows, the temporary table will be small and the search for the next term will be quick (because they will be performed on a small table). The difference between the two requests is subtle, but a Django request can return more matches in general.