Improving SQL query speed with MAX, WHERE, and GROUP BY on three different columns

I am trying to speed up a query that takes about 60 seconds to populate a table of ~ 20 million rows.

In this example, the table has three columns (id, dateAdded, name). id is the primary key. Indexes I added to the table:

(dateAdded)
(name)
(id, name)
(id, name, dateAdded)

The query I'm trying to run is:

SELECT MAX(id) as id, name 
FROM exampletable 
WHERE dateAdded <= '2014-01-20 12:00:00' 
GROUP BY name 
ORDER BY NULL;

Date has a variable from request to request.

The purpose of this is to get the most recent record for each name before or after adding the date.

When I use the explanation on request, it tells me that it uses the index (id, name, dateAdded).

+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+
| id | select_type | table            | type  | possible_keys    | key                                          | key_len | ref  | rows     | Extra                                                     |
+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | exampletable     | index | date_added_index | id_element_name_date_added_index             | 162     | NULL | 22016957 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------------+-------+------------------+----------------------------------------------+---------+------+----------+-----------------------------------------------------------+

Edit: Added two new indexes from comments:

(dateAdded, name, id)
(name, id)

+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+
| id | select_type | table            | type  | possible_keys                                                 | key                                          | key_len | ref  | rows     | Extra                                     |
+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | exampletable     | index | date_added_index,date_added_name_id_index                     | id__name_date_added_index                    | 162     | NULL | 22040469 | Using where; Using index; Using temporary |
+----+-------------+------------------+-------+---------------------------------------------------------------+----------------------------------------------+---------+------+----------+-------------------------------------------+

Edit: Added create script table.

CREATE TABLE `exampletable` (
  `id` int(10) NOT NULL auto_increment,
  `dateAdded` timestamp NULL default CURRENT_TIMESTAMP,
  `name` varchar(50) character set utf8 default '',
  PRIMARY KEY  (`id`),
  KEY `date_added_index` (`dateAdded`),
  KEY `name_index` USING BTREE (`name`),
  KEY `id_name_index` USING BTREE (`id`,`name`),
  KEY `id_name_date_added_index` USING BTREE (`id`,`dateAdded`,`name`),
  KEY `date_added_name_id_index` USING BTREE (`dateAdded`,`name`,`id`),
  KEY `name_id_index` USING BTREE (`name`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=22046064 DEFAULT CHARSET=latin1

Edit: Here is an explanation from the answer provided by HeavyE.

+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
| id | select_type | table        | type  | possible_k                                                                               | key                      | key_len | ref                                              | rows | Extra                                 |
+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
|  1 | PRIMARY     | <derived2>   | ALL   | NULL                                                                                     | NULL                     | NULL    | NULL                                             | 1732 | Using temporary; Using filesort       |
|  1 | PRIMARY     | example1     | ref   | date_added_index,name_index,date_added_name_id_index,name_id_index,name_date_added_index | date_added_name_id_index | 158     | maxDateByElement.dateAdded,maxDateByElement.name |    1 | Using where; Using index              |
|  2 | DERIVED     | exampletable | range | date_added_index,date_added_name_id_index                                                | name_date_added_index    | 158     | NULL                                             | 1743 | Using where; Using index for group-by |
+----+-------------+--------------+-------+------------------------------------------------------------------------------------------+--------------------------+---------+--------------------------------------------------+------+---------------------------------------+
+4
3

: fooobar.com/questions/819/...

, :

SELECT example1.name, MAX(example1.id)
FROM exampletable example1
INNER JOIN (
select name, max(dateAdded) dateAdded
from exampletable
where dateAdded  <= '2014-01-20 12:00:00' 
group by name
) maxDateByElement on example1.name = maxDateByElement.name AND example1.dateAdded = maxDateByElement.dateAdded
GROUP BY name;
+5

? where , , dateAdded , sql-, :

SELECT MAX(id) as id, name 
FROM exampletable 
USE INDEX (dateAdded_index) USE INDEX FOR GROUP BY (name_index) 
WHERE dateAdded <= '2014-01-20 12:00:00' 
GROUP BY name
ORDER BY NULL;

, . , , .

+2

IF the where command does not matter, then its either max (id) or name. I would check the indexes by excluding Max (id) completely and see if the group is fast by name. Then I will add Min (id) to find out if it will be faster than Max (id). (I saw that it matters).

In addition, you must check the order with NULL. Try Order by name desc or Order by name asc. Clark Vera

0
source

Source: https://habr.com/ru/post/1524344/


All Articles