MySQL index in numeric columns makes query slower

I have an optimization problem with a rather large table (~ 1.7M rows).

When choosing rows, two columns are used, allowing you to call them colA and colB. They are both of type "double" (5 decimal places) and vary from:

colA: -90 ~ 90 colB: -180 ~ 180

Without an index, any form request:

SELECT * FROM table where colA BETWEEEN a and b AND colB BETWEEN c and d 

it takes about the same time to start (~ 1 second), regardless of the range (a, b) and (c, d) (since MySQL must check every row).

If I add an index to colA and colB, two things will happen: queries in which the range (a, b) and (c, d) is small, for example:

 SELECT * FROM table where colA BETWEEEN -4 and 4 AND colB BETWEEN 3 and 7 

very fast (~ 1/10 of a second). However, the runtime increases with the range between the requested values. For instance:

  SELECT * FROM table where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150 

it takes about a minute.

I know how B-trees work for strings, but I'm not sure about the mechanism when the data is numeric and the query is executed using a range.

If anyone can suggest how to optimize this query, I would be grateful. One thought is to use an index for small ranges and tell MySQL not to use it for larger ranges, however I could not find a command that allows this.

thanks

EDIT: Explanations

I forgot to forget something. The results are ordered by rand () - I know how inefficient it is, but I have not seen another way to get a limited number of rows from a table in random order.

Adding rand () does not affect runtime when there is no index, but significantly increases the time taken when there is.

EDIT2: This is the use of composite indexes.

SMALL RANGE:

"explain select * from the table where colA between 35 and 38 and colB between -10 and 5 ORDER BY RAND () LIMIT 20"

9783 lines

NO INDEX (fast)

 +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 1673784 | Using where | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 

WITH INDEX (very fast)

 +----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | table | range | test | test | 18 | NULL | 136222 | Using where | +----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+ 



LARGE RANGE:

"explain select * from the table where colA between -80 and 80 and colB between -150 and 150 ORDER BY RAND () LIMIT 20;"

1631862 rows

NO INDEX (fast)

 +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 1673784 | Using where | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 

WITH INDEX (very slow:> 60 seconds)

 +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | table | ALL | test | NULL | NULL | NULL | 1673784 | Using where | +----+-------------+-------+------+---------------+------+---------+------+---------+-------------+ 

EDIT3:

To summarize: (all queries are limited to returning 20 rows)

large range with rand () with index: 45 seconds
large range without rand (), with index: 0.003 seconds

large range with random, no index: 1 second
large range without rand, without index: 0.003 seconds

Anomaly: "large range with rand () with index, 45 seconds."

+4
source share
3 answers

I know how B-trees work for strings, but I'm not sure about the mechanism when the data is numeric and the query is executed using a range.

They work the same for numbers as they do for strings.

Without an index, the query takes about the same time to start (~ 1 second), regardless of the range (a, b) and (c, d)

The execution time of a full table scan does not significantly depend on the contents of the WHERE clause. The time taken to access the index is proportional to the number of rows returned. If the query selects a significant portion of the table, using the index will always be slower than not using the index.

The access path to the index is effective only if the selective activity of the index is sufficient, i.e. the number of rows retrieved is small (some say that 10% is at most). The execution time will be approximately proportional to the number of rows returned and may end more slowly than a full table scan.

One thought is to use an index for small ranges and tell MySQL not to use it for larger ranges, however I could not find a command that allows this.

The query optimizer should use statistics and heuristics to determine whether to use the index. You may need to update these statistics using OPTIMIZE TABLE . If he still cannot make the right decision, you can help him with hints .

 SELECT * FROM table IGNORE INDEX (the_index) where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150 

Other parameters may be deleting the index (if you never see any benefit from it, it may take a constant response time of one second) or trying a composite index for both columns (also only if the number of records resulting from the query is small).


Now that you mention LIMIT 20, this starts to make sense:

large range with rand () with index: 45 seconds

NESTED LOOP with many results + SORT

Get ALL records (in a range) from the index, select them one by one from the table, then sort and then limit to 20

large range without rand (), with index: 0.003 seconds

NESTED LOOP interrupted for 20 records

Get 20 records from the index, pull them out one by one from the table and return them. No sorting, in fact, is a large range.

large range with rand, no index: 1 second

FULL TABLE SCAN + SORT

Read the whole table, save what is in the range, then sort, and then limit to 20

large range without rand, without index: 0.003 seconds

FULL TABLE SCAN interrupted for 20 records

Start reading the table, keep what is in range, stop when you have 20 and return.

+5
source

Indexes with many duplicates are waste.

Make sure your index uses both fields;

 create index idx_faster on tbl_mytbl (colA,colB) 

for colB you can add another,

 create index idx_colb on tbl_mytbl (colB) 

Regards, / T

0
source

The last request should not take more time than the first. MySQL may not update the index, see OPTIMIZE TABLE

You can also check how he plans the query with EXPLAIN and EXPLAIN ANALYZE .

Finally, you can force the index to shut down using IGNORE INDEX (idx_name)

0
source

Source: https://habr.com/ru/post/1332166/


All Articles