Optimization of a slow query to choose MySQL

EDIT . Having looked at some of the answers here and at the hours of research, my team came to the conclusion that, most likely, there is no way to optimize this further than the 4.5 seconds that we were able to achieve (if possible, with separation on clicks sentences, but this will have some ugly side effects). In the end, after a lot of brainstorming, we decided to separate both requests, create two sets of user identifiers (one from the user table and one from clicks_ suggestions) and compare them with the set in Python. The set of identifiers from the user table is still pulled from SQL, but we decided to move offer_clicks to Lucene, and also added some caching on top of it, so now from a different set of identifiers. The end result is that it is about half a second with a cache and 0.9 seconds without a cache.

Start of original post: I have a problem with query optimization. The first version of the request is fine, but the moment of clic_suggest is connected in the second request, the request becomes rather slow. The user table contains 10 million rows, contains_clicks contains 53 million rows.

Allowable performance:

SELECT count(distinct(users.id)) AS count_1 FROM users USE index (country_2) WHERE users.country = 'US' AND users.last_active > '2015-02-26'; 1 row in set (0.35 sec) 

Bad:

 SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country = 'US' AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 1 row in set (7.39 sec) 

Here's how it looks without specific indexes (worse):

 SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 1 row in set (17.72 sec) 

I explain:

 explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; +----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ | 1 | SIMPLE | users | range | country_2 | country_2 | 14 | NULL | 245014 | Using where; Using index | | 1 | SIMPLE | offers_clicks | ref | user_id_3 | user_id_3 | 4 | dejong_pointstoshop.users.id | 270153 | Using where; Using index | +----+-------------+---------------+-------+---------------+-----------+---------+------------------------------+--------+--------------------------+ 

Explain without specifying any indexes:

 mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ | 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index | | 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where | +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ 

Here is a whole group of indexes that I tried with not too much success:

 +---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | | | users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | | | users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | | 

Simplified user schema:

 +---------------------------------+---------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------------------------+---------------+------+-----+---------------------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | country | char(2) | NO | MUL | | | | last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | | 

Simplified click-through scheme of offers:

 +-----------------+------------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------+------------------+------+-----+---------------------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | user_id | int(11) | NO | MUL | 0 | | | offer_id | int(11) unsigned | NO | MUL | NULL | | | date | datetime | NO | MUL | 0000-00-00 00:00:00 | | | ranking_score | decimal(5,2) | NO | MUL | 0.00 | | 
+6
source share
6 answers

This is your request:

 SELECT count(distinct u.id) AS count_1 FROM offers_clicks oc JOIN users u ON oc.user_id = u.id WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND oc.date > '2015-02-14' AND oc.ranking_score > 0.24 AND oc.ranking_score < 3.49; 

First, instead of count(distinct) you can write the query as:

 SELECT count(*) AS count_1 FROM users u WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND EXISTS (SELECT 1 FROM offers_clicks oc WHERE oc.user_id = u.id AND oc.date > '2015-02-14' AND oc.ranking_score > 0.24 AND oc.ranking_score < 3.49 ) 

Then the best indexes for this query are users(country, last_active, id) and either offers_clicks(user_id, date, ranking_score) or offers_clicks(user_id, ranking_score, date) .

+5
source
 SELECT count(distinct u.id) AS count_1 FROM users u STRAIGHT_JOIN offers_clicks oc ON oc.user_id = u.id WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND oc.date > '2015-02-14' AND oc.ranking_score > 0.24 AND oc.ranking_score < 3.49; 

Make sure you have an index for users - ( id , last_active , country ) columns and user_id - ( user_id , date , ranking_score )

Or you can cancel the order

 SELECT count(distinct u.id) AS count_1 FROM offers_clicks oc STRAIGHT_JOIN users u ON oc.user_id = u.id WHERE u.country IN ('US') AND u.last_active > '2015-02-26' AND oc.date > '2015-02-14' AND oc.ranking_score > 0.24 AND oc.ranking_score < 3.49; 

Make sure you have an index in the offers_clicks column - ( user_id ) and users - ( id , last_active , country )

+1
source
 SELECT count(users.id) AS count_1 FROM users INNER JOIN (SELECT DISTINCT user_id FROM offers_clicks WHERE offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24 ) as clicks ON clicks.user_id = users.id WHERE users.country IN ('US') AND users.last_active > '2015-02-26' 

Could you provide sqlfiddle with some data?

and could you tell me what the lead time for this request is:

 SELECT DISTINCT user_id FROM offers_clicks WHERE offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24 

EDIT QUESTION How long does this take?

 SELECT DISTINCT user_id FROM offers_clicks USE INDEX (user_id_4) WHERE offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24 
0
source

Try to do it differently:

 SELECT COUNT(users.id) FROM users, offers_clicks WHERE users.country = 'US' AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 
0
source

Try the following:

 SELECT count(distinct users.id) AS count_1 FROM users USE index (<see below>) JOIN offers_clicks USE index (<see below>) ON offers_clicks.user_id = users.id AND offers_clicks.date BETWEEN '2015-02-14' AND CURRENT_DATE AND offers_clicks.ranking_score BETWEEN 0.24 AND 3.49 WHERE users.country = 'US' AND users.last_active BETWEEN '2015-02-26' AND CURRENT_DATE 

Make sure there are indexes on users(country, last_active, id) and offers_clicks(user_id, ranking_score, date) and USE .

Let me know how it works, and if it works, I will explain why.

0
source

First of all, I also think that you should use join and try to join only the lines that you really need as a result. Regarding table Offers_clicks, I think you should not use index user_id_3 and use user_id_2 because the power of user_id_2 is higher than the power of user_id_3 (according to your indexes) and it should be faster.

 SELECT count(distinct(users.id)) AS count_1 FROM users USE INDEX (country_2) JOIN offers_clicks USE INDEX (user_id_2) ON offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24 WHERE users.country = 'US' AND users.last_active > '2015-02-26' ; 

For this query you do not need to modify the table, so I think you can try.
It may be useful to try to reduce the date range, and as a result reduce the number of rows in the results, this should be faster.

Not sure if I will be helpful ...

0
source

Source: https://habr.com/ru/post/983514/


All Articles