SELECT with query variables not using INDEX

Question

SELECT with query variables not using INDEX

I played (out of interest) with fetching a tree of nodes in a simple adjacency list with a recursive query using local variables.

A solution that I still enjoy, but I wonder (and this is my only question) why MySQL refuses to use any INDEX to optimize this query. Should MySQL not search for the closest child using INDEX ?

I am curious why MySQL does not. Even when I use FORCE INDEX , the execution plan does not change.

This is the request so far, with 5 being the identifier of the parent node:

 SELECT @last_id := id AS id, parent_id, name, @depth := IF(parent_id = 5, 1, @depth + 1) AS depth FROM tree FORCE INDEX (index_parent_id, PRIMARY, index_both), (SELECT @last_id := 5, @depth := -1) vars WHERE id = 5 OR parent_id = @last_id OR parent_id = 5

Try live example in SQLfiddle

Note that the reason cannot be a small dataset because the behavior does not change when I specify FORCE INDEX (id) or FORCE INDEX (parent_id) or FORCE INDEX (id, parent_id) ...

The docs say:

You can also use FORCE INDEX, which acts like a USE INDEX (index_list), but with the addition that scanning a table is considered very expensive. In other words, a table scan is used only if there is no way to use one of the specified indexes to search for rows in the table.

There must be something that makes the request unable to use INDEX, but I don’t understand what it is.

Disclaimer: I know that there are various ways to store and retrieve hierarchical data in SQL. I know about the model of nested sets. I am not looking for an alternative implementation. I am not looking for nested sets.

I also know that the request itself is a nut and leads to incorrect results.

I just want to understand (in detail) why MySQL does not use INDEX in this case.

+6

mysql indexing hierarchical-data adjacency-list query-variables

Kaii Jul 9 '12 at 21:53

source share

1 answer

Shlomi noach · Accepted Answer · 2012-07-10T07:50:22+0000

The reason is because the OR clause is used in the WHERE clause .

To illustrate this, try executing the request again, this time only with the condition id = 5 and get (EXPLAIN output):

 +----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+ | 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | tree | const | PRIMARY,index_both | PRIMARY | 4 | const | 1 | | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+

And again, this time only with the condition parent_id = @last_id OR parent_id = 5 and we get:

 +----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+ | 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | | | 1 | PRIMARY | tree | ALL | index_parent_id | NULL | NULL | NULL | 10 | Using where | | 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+

MySQL is not very good at handling multiple indexes in a single query. Things are a little better with the terms AND; it is more likely that index_merge optimizations than index union .

Things improve as versions progress, but I tested your request on version 5.5 , which is on the current latest production version, and the results, as you described.

To explain why this is difficult, consider: two different indexes will respond to two different query conditions. One answer will be responsible for id = 5 , and the other for parent_id = @last_id OR parent_id = 5 (BTW without problems with OR inside the latter, since both terms are processed from the same index).

There is no single index that can answer for both, and therefore the FORCE INDEX command is ignored. See, FORCE INDEX says that MySQL should use an index to scan a table. This does not mean that more than one index must be used to scan a table.

So, MySQL follows the documentation rules here. But why is it so complicated? Since MySQL must collect results from both for a response using both indexes, store them in a temporary buffer, controlling the second. Then you need to go through this buffer to filter out identical lines (it is possible that some line meets all conditions). And then scan this buffer to return the results.

But wait, this buffer is not indexed by itself. Filtering duplicates is not an obvious task. Therefore, MySQL prefers to work with the source table and scan there, and avoid all this mess.

Of course, this is solvable. Oracle engineers can still improve this (they have been working hard lately to improve the query of execution plans), but I don’t know if it is on the TODO task, or if it has high priority.

SELECT with query variables not using INDEX

More articles: