Why is my MySQL group so slow?

I am trying to query a partitioned table (by month) approaching rows of 20M. I need to group DATE (transaction_utc) as well as country_id. The rows that are returned if I disconnect the group by groups and aggregates are a little more than 40k, which is not so much, however adding a group makes the query much slower, unless GROUP BY is in the transaction_utc column, in which case it gets FAST .

I tried to optimize this first query below by changing the query and / or indexes and reaching a point lower (about 2 times faster than the original), but still stuck in a 5-second query to sum 45 thousand rows, which seems too big.

For reference, this field is a completely new 24 logical core, 64-gigabyte RAM, Mariadb-5.5.x server with more INNODB buffer pool than the index space on the server, so there should be no pressure on the RAM or CPU.

So, I'm looking for ideas on what makes this slow down, and suggestions for speeding it up. Any feedback would be greatly appreciated! :)

Ok, on the details ...

The following query (the one I really need) takes about 5 seconds (+/-) and returns less than 100 rows.

SELECT lss.`country_id` AS CountryId , Date(lss.`transaction_utc`) AS TransactionDate , c.`name` AS CountryName, lss.`country_id` AS CountryId , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD FROM `sales` lss JOIN `countries` c ON lss.`country_id` = c.`country_id` WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`) 

EXPLAIN SELECT for the same query is as follows. Note that it does not use the transaction_utc key. Shouldn't I use my coverage index?

 id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort 1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1 

Now for a couple of other parameters that I tried to try to determine what was going on ...

The following query (with a modified group) takes about 5 seconds (+/-) and returns only 3 rows:

 SELECT lss.`country_id` AS CountryId , DATE(lss.`transaction_utc`) AS TransactionDate , c.`name` AS CountryName, lss.`country_id` AS CountryId , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD FROM `sales` lss JOIN `countries` c ON lss.`country_id` = c.`country_id` WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id` 

The following query (remote group) takes 4-5 seconds (+/-) and returns 1 row:

 SELECT lss.`country_id` AS CountryId , DATE(lss.`transaction_utc`) AS TransactionDate , c.`name` AS CountryName, lss.`country_id` AS CountryId , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD FROM `sales` lss JOIN `countries` c ON lss.`country_id` = c.`country_id` WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) 

The following query takes .00X seconds (+/-) and returns ~ 45k rows. This for me shows that with max we try to group 45K rows into less than 100 groups (as in my initial request):

 SELECT lss.`country_id` AS CountryId , DATE(lss.`transaction_utc`) AS TransactionDate , c.`name` AS CountryName, lss.`country_id` AS CountryId , COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD , COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD FROM `sales` lss JOIN `countries` c ON lss.`country_id` = c.`country_id` WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`transaction_utc` 

DIAGRAM TABLE:

 CREATE TABLE IF NOT EXISTS `sales` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `user_linkshare_account_id` int(11) unsigned NOT NULL, `username` varchar(16) NOT NULL, `country_id` int(4) unsigned NOT NULL, `order` varchar(16) NOT NULL, `raw_tracking_code` varchar(255) DEFAULT NULL, `transaction_utc` datetime NOT NULL, `processed_utc` datetime NOT NULL , `sku` varchar(16) NOT NULL, `sale_original` decimal(10,4) NOT NULL, `sale_usd` decimal(10,4) NOT NULL, `quantity` int(11) NOT NULL, `commission_original` decimal(10,4) NOT NULL, `commission_usd` decimal(10,4) NOT NULL, `original_currency` char(3) NOT NULL, PRIMARY KEY (`id`,`transaction_utc`), UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`), KEY `raw_tracking_code` (`raw_tracking_code`), KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`), KEY `idx_countries` (`country_id`), KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 /*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`)) (PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB, PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB, PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB, PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB, PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB, PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ; 
+4
source share
1 answer

The offensive part is probably GROUP BY DATE(transaction_utc) . You also claim that there is a coverage index for this request, but I do not see it. Your five-column index contains all the columns used in the query, but not in the best order (this is: WHERE - GROUP BY - SELECT ).

So, the engine, not finding a useful index, would have to evaluate this function for all 20M lines. In fact, it finds an index starting with username ( idx_unique ), and it uses that, so it should evaluate the function for (only) 1.2M lines. If you have (transaction_utc) or (username, transaction_utc) , he will choose the most useful of the three.

Can you let me change the structure of the table by dividing the columns by date and time? If possible, then an index on (username, country_id, transaction_date) or (changing the order of two columns used for grouping), on (username, transaction_date, country_id) will be quite effective.

The coverage index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.


If you want to keep the current structure, try changing the order inside your 5-column index to:

 (username, country_id, transaction_utc, sale_usd, commission_usd) 

or:

 (username, transaction_utc, country_id, sale_usd, commission_usd) 

Since you are using MariaDB, you can use VIRTUAL columns without changing existing columns

Add a virtual (persistent) column and the corresponding index:

 ALTER TABLE sales ADD COLUMN transaction_date DATE NOT NULL AS DATE(transaction_utc) PERSISTENT ADD INDEX special_IDX (username, country_id, transaction_date, sale_usd, commission_usd) ; 
+9
source

Source: https://habr.com/ru/post/1442535/


All Articles