EDIT . Having looked at some of the answers here and at the hours of research, my team came to the conclusion that, most likely, there is no way to optimize this further than the 4.5 seconds that we were able to achieve (if possible, with separation on clicks sentences, but this will have some ugly side effects). In the end, after a lot of brainstorming, we decided to separate both requests, create two sets of user identifiers (one from the user table and one from clicks_ suggestions) and compare them with the set in Python. The set of identifiers from the user table is still pulled from SQL, but we decided to move offer_clicks to Lucene, and also added some caching on top of it, so now from a different set of identifiers. The end result is that it is about half a second with a cache and 0.9 seconds without a cache.
Start of original post: I have a problem with query optimization. The first version of the request is fine, but the moment of clic_suggest is connected in the second request, the request becomes rather slow. The user table contains 10 million rows, contains_clicks contains 53 million rows.
Allowable performance:
SELECT count(distinct(users.id)) AS count_1 FROM users USE index (country_2) WHERE users.country = 'US' AND users.last_active > '2015-02-26'; 1 row in set (0.35 sec)
Bad:
SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country = 'US' AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 1 row in set (7.39 sec)
Here's how it looks without specific indexes (worse):
SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; 1 row in set (17.72 sec)
I explain:
explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks USE index (user_id_3), users USE index (country_2) WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; +
Explain without specifying any indexes:
mysql> explain SELECT count(distinct(users.id)) AS count_1 FROM offers_clicks, users WHERE users.country IN ('US') AND users.last_active > '2015-02-26' AND offers_clicks.user_id = users.id AND offers_clicks.date > '2015-02-14' AND offers_clicks.ranking_score < 3.49 AND offers_clicks.ranking_score > 0.24; +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+ | 1 | SIMPLE | users | range | PRIMARY,last_active,country,last_active_2,country_2 | country_2 | 14 | NULL | 221606 | Using where; Using index | | 1 | SIMPLE | offers_clicks | ref | user_id,user_id_2,date,date_2,date_3,ranking_score,user_id_3,user_id_4 | user_id_2 | 4 | dejong_pointstoshop.users.id | 3 | Using where | +----+-------------+---------------+-------+------------------------------------------------------------------------+-----------+---------+------------------------------+--------+--------------------------+
Here is a whole group of indexes that I tried with not too much success:
+---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +---------------+------------+-----------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | offers_clicks | 1 | user_id_3 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_3 | 2 | ranking_score | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_3 | 3 | date | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_2 | 1 | user_id | A | 17838712 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_2 | 2 | date | A | 53516137 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 1 | user_id | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 2 | date | A | 198 | NULL | NULL | | BTREE | | | | offers_clicks | 1 | user_id_4 | 3 | ranking_score | A | 198 | NULL | NULL | | BTREE | | | | users | 1 | country_2 | 1 | country | A | 14 | NULL | NULL | | BTREE | | | | users | 1 | country_2 | 2 | last_active | A | 8048529 | NULL | NULL | | BTREE | | |
Simplified user schema:
+---------------------------------+---------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------------------------------+---------------+------+-----+---------------------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | country | char(2) | NO | MUL | | | | last_active | datetime | NO | MUL | 2000-01-01 00:00:00 | |
Simplified click-through scheme of offers:
+-----------------+------------------+------+-----+---------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------+------------------+------+-----+---------------------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | user_id | int(11) | NO | MUL | 0 | | | offer_id | int(11) unsigned | NO | MUL | NULL | | | date | datetime | NO | MUL | 0000-00-00 00:00:00 | | | ranking_score | decimal(5,2) | NO | MUL | 0.00 | |