GROUP BY WHERE range AND const ref without time

Question

GROUP BY WHERE range AND const ref without time

In fact, the base table containing quotes on instruments on several exchanges using the TokuDB storage mechanism:

CREATE TABLE `quotes` ( `ticker` char(4) NOT NULL, `timestamp` time(3) NOT NULL, `price` decimal(7,2) unsigned NOT NULL, `size` smallint(5) unsigned NOT NULL, `exchange` char(3) NOT NULL, KEY `best_price` (`ticker`,`timestamp`,`exchange`,`price`), KEY `best_size` (`exchange`,`ticker`,`price`,`timestamp`) ) ENGINE=TokuDB

Whenever I ask for the best price on all exchanges, she always uses a temporary table. The presence of exchange and price only in the index leads to the fact that index scanning is equivalent to the cluster key on (ticker, timestamp) in TokuDB.

 EXPLAIN SELECT max(price),exchange FROM quotes WHERE ticker="A" AND timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: best_price key: best_price key_len: 9 ref: NULL rows: 2690 Extra: Using where; Using index; Using temporary

Is it possible to determine a configuration that would not use a temporary table? This is only possible when dropping a timestamp article:

 EXPLAIN SELECT max(price),exchange FROM quotes WHERE ticker="A" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: best_price key: best_size key_len: 7 ref: NULL rows: 96 Extra: Using where; Using index for group-by

Example of output from the request for the best price:

 +------------+----------+ | max(price) | exchange | +------------+----------+ | 41.06 | BTY | | 41.06 | DEA | | 41.07 | NYS | | 41.07 | THM | | 41.06 | PSE | | 41.07 | BAT | | 41.06 | DEX | | 41.06 | BOS | | 41.06 | ADC | | 41.06 | XPH | +------------+----------+ 10 rows in set (0.01 sec)

And a bottleneck (3 ms) handles every single line in a time range:

 +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000071 | | checking permissions | 0.000005 | | Opening tables | 0.000016 | | After opening tables | 0.000006 | | System lock | 0.000014 | | Table lock | 0.000002 | | After table lock | 0.000005 | | init | 0.000038 | | optimizing | 0.000024 | | statistics | 0.000155 | | preparing | 0.000028 | | executing | 0.000003 | | Copying to tmp table | 0.000031 | | Copying to tmp table | 0.003381 | | Sending data | 0.000017 | | end | 0.000004 | | removing tmp table | 0.000020 | | end | 0.000002 | | query end | 0.000005 | | closing tables | 0.000005 | | freeing items | 0.000006 | | updating status | 0.000011 | | cleaning up | 0.000002 | +----------------------+----------+

The time range contains 2316 lines, divided into exchanges:

 +----------+----------+ | exchange | count(*) | +----------+----------+ | ADC | 71 | | BAT | 298 | | BOS | 129 | | BTY | 266 | | DEA | 153 | | DEX | 60 | | NYS | 530 | | PSE | 325 | | THM | 453 | | XPH | 31 | +----------+----------+

I tried crazy and added every permutation of the coverage index, and MariaDB cannot find a better key. Are there any other databases that I should look at?

Example dataset for time range and ticker symbol: http://pastebin.com/b5RcTXAs

+6

mysql mariadb tokudb

Steve-o Dec 25 '14 at 3:56

source share

1 answer

Steve-o · Accepted Answer · 2014-12-28T04:30:27+0000

The answer is all about optimization. MySQL preferred to use a temporary table because it is considered more reasonable than using indexes to retrieve each exchange. If you use a clustered index in exchange,ticker,timestamp , then the request can work without a temporary one:

 MariaDB [trth]> explain SELECT min(ask_price),exchange FROM quotes USE INDEX (exchange_ticker_timestamp) WHERE exchange IN ("NYS","BOS","CIN","ADC","DEX","DEA","MID","PSE","THM","WCB","BAT","XPH","BTY") AND ticker="A" AND timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: exchange_ticker_timestamp key: exchange_ticker_timestamp key_len: 10 ref: NULL rows: 2589 Extra: Using where; Using index MariaDB [trth]> show profile; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000079 | | checking permissions | 0.000006 | | Opening tables | 0.000014 | | After opening tables | 0.000011 | | System lock | 0.000014 | | Table lock | 0.000003 | | After table lock | 0.000005 | | init | 0.000043 | | optimizing | 0.000019 | | statistics | 0.000234 | | preparing | 0.000027 | | executing | 0.000008 | | Sorting result | 0.000002 | | Sending data | 0.002985 | | end | 0.000006 | | query end | 0.000010 | | closing tables | 0.000006 | | freeing items | 0.000007 | | updating status | 0.000138 | | cleaning up | 0.000004 | +----------------------+----------+

Comparison with the temporary grouping of tables:

 MariaDB [trth]> explain SELECT min(ask_price),exchange FROM quotes WHERE ticker="A" AND timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: ticker_timestamp key: ticker_timestamp key_len: 9 ref: NULL rows: 1515 Extra: Using where; Using temporary MariaDB [trth]> show profile; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000091 | | checking permissions | 0.000009 | | Opening tables | 0.000037 | | After opening tables | 0.000009 | | System lock | 0.000052 | | Table lock | 0.000004 | | After table lock | 0.000009 | | init | 0.000049 | | optimizing | 0.000025 | | statistics | 0.000144 | | preparing | 0.000039 | | executing | 0.000003 | | Copying to tmp table | 0.000040 | | Copying to tmp table | 0.004674 | | Sending data | 0.000020 | | end | 0.000003 | | removing tmp table | 0.000015 | | end | 0.000003 | | query end | 0.000004 | | closing tables | 0.000006 | | freeing items | 0.000006 | | updating status | 0.000204 | | cleaning up | 0.000004 | +----------------------+----------+

It is interesting to note that the first query scans more lines, but runs faster than the second.

GROUP BY WHERE range AND const ref without time

More articles: