In fact, the base table containing quotes on instruments on several exchanges using the TokuDB storage mechanism:
CREATE TABLE `quotes` ( `ticker` char(4) NOT NULL, `timestamp` time(3) NOT NULL, `price` decimal(7,2) unsigned NOT NULL, `size` smallint(5) unsigned NOT NULL, `exchange` char(3) NOT NULL, KEY `best_price` (`ticker`,`timestamp`,`exchange`,`price`), KEY `best_size` (`exchange`,`ticker`,`price`,`timestamp`) ) ENGINE=TokuDB
Whenever I ask for the best price on all exchanges, she always uses a temporary table. The presence of exchange and price only in the index leads to the fact that index scanning is equivalent to the cluster key on (ticker, timestamp) in TokuDB.
EXPLAIN SELECT max(price),exchange FROM quotes WHERE ticker="A" AND timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: best_price key: best_price key_len: 9 ref: NULL rows: 2690 Extra: Using where; Using index; Using temporary
Is it possible to determine a configuration that would not use a temporary table? This is only possible when dropping a timestamp article:
EXPLAIN SELECT max(price),exchange FROM quotes WHERE ticker="A" GROUP BY exchange ORDER BY NULL \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: quotes type: range possible_keys: best_price key: best_size key_len: 7 ref: NULL rows: 96 Extra: Using where; Using index for group-by
Example of output from the request for the best price:
+------------+----------+ | max(price) | exchange | +------------+----------+ | 41.06 | BTY | | 41.06 | DEA | | 41.07 | NYS | | 41.07 | THM | | 41.06 | PSE | | 41.07 | BAT | | 41.06 | DEX | | 41.06 | BOS | | 41.06 | ADC | | 41.06 | XPH | +------------+----------+ 10 rows in set (0.01 sec)
And a bottleneck (3 ms) handles every single line in a time range:
+----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000071 | | checking permissions | 0.000005 | | Opening tables | 0.000016 | | After opening tables | 0.000006 | | System lock | 0.000014 | | Table lock | 0.000002 | | After table lock | 0.000005 | | init | 0.000038 | | optimizing | 0.000024 | | statistics | 0.000155 | | preparing | 0.000028 | | executing | 0.000003 | | Copying to tmp table | 0.000031 | | Copying to tmp table | 0.003381 | | Sending data | 0.000017 | | end | 0.000004 | | removing tmp table | 0.000020 | | end | 0.000002 | | query end | 0.000005 | | closing tables | 0.000005 | | freeing items | 0.000006 | | updating status | 0.000011 | | cleaning up | 0.000002 | +----------------------+----------+
The time range contains 2316 lines, divided into exchanges:
+----------+----------+ | exchange | count(*) | +----------+----------+ | ADC | 71 | | BAT | 298 | | BOS | 129 | | BTY | 266 | | DEA | 153 | | DEX | 60 | | NYS | 530 | | PSE | 325 | | THM | 453 | | XPH | 31 | +----------+----------+
I tried crazy and added every permutation of the coverage index, and MariaDB cannot find a better key. Are there any other databases that I should look at?
Example dataset for time range and ticker symbol: http://pastebin.com/b5RcTXAs