Why is this mysql update-with-join query so slow?

I have an application that needs to update nodes in a hierarchical structure, up from a specific node whose identifier is known. For this, I use the following MySQL statement:

update node as A join node as B on A.lft<=B.lft and A.rgt>=B.rgt set A.count=A.count+1 where B.id=? 

The table has a primary key on id and indexes on lft and rgt. The statement works, but I found that it had performance issues. If you look at the EXPLAIN results for the corresponding select statement, I saw that the number of rows checked for table "B" was very large (possibly the whole table).

I can easily split the request into two separate parts:

 select lft, rgt from node where id=? LFT=result.lft RGT=result.rgt update node set count=count+1 where lft<=LFT and rgt>=RGT 

But why doesn't the original statement work as expected, and how do I need to reformulate it for better performance?

Upon request, an abridged version of the create table is provided here:

 CREATE TABLE `node` ( `id` int(11) NOT NULL auto_increment, `name` varchar(255) NOT NULL, `lft` decimal(64,0) NOT NULL, `rgt` decimal(64,0) NOT NULL, `count` int(11) NOT NULL default '0', PRIMARY KEY (`id`), KEY `name` (`name`), KEY `location` (`location`(255)), KEY `lft` (`lft`), KEY `rgt` (`rgt`), ) ENGINE=InnoDB 

I did not try to add a composite index (in fact, I do not have the access level necessary for this on the spot); but I don’t understand how this will help, trying to understand how the database engine will try to solve the dual inequality.

+4
source share
4 answers

You can "force" (at least until 5.5, version 5.6 has improved optimization algorithms that can do redundant overwriting). MySQL first evaluates the conditions in table B, taking the first part of your partition as a subquery, and then using this as a view and joining table A:

 UPDATE node AS a JOIN ( SELECT lft, rgt FROM node WHERE id = ? ) AS b ON a.lft <= b.lft AND a.rgt >= b.rgt SET a.count = a.count + 1 ; 

Efficiency will still depend on which of the two indexes is chosen to limit the updated rows. Still after using either of these two indexes, a table lookup is needed to check the other column. Therefore, I suggest that you add a composite index on (lft, rgt) and one on (rgt, lft) , so only one index is used to determine which rows should be updated.

I believe that you are using the Nested Set, and the effectiveness of this update will not be large in a large table, since the query has two range conditions and this limits the efficiency of the B-tree indexes.

+7
source

I think the biggest performance issue is the unnecessary JOIN you are using. You can do this by simply making two small subqueries, instead of joining two large tables.

Here is an example:

 UPDATE node AS a SET a.count = a.count+1 WHERE a.lft <= (SELECT lft FROM node WHERE id = ?) AND a.rgt >= (SELECT rgt FROM node WHERE id = ?) 
+4
source

This is just a suggestion; I don't know if this will work.

The problem with your query is that you have inequalities in two columns. This makes it very difficult to use indexes for both of them, which in turn makes join very inefficient. This idea is to make two joins, one for each side of the inequality, and then include id in the on condition. Thus, only nodes that pass both will pass:

 UPDATE node a JOIN (SELECT lft, rgt FROM node WHERE id = ? ) l ON a.lft <= l.lft join (SELECT lft, rgt FROM node WHERE id = ? ) r on a.rgt >= r.rgt SET a.count = a.count + 1 ; 

Like I said, I don't know if this will work. But you should be able to easily check explain for the query to see if the plan uses indexes for both inequalities.

+3
source

I know that mysql has problems referencing the table being updated, but for me the obvious solution would be the following:

 update node A set A.count=A.count+1 WHERE EXISTS ( SELECT * FROM node B WHERE B.id=? AND A.lft<=B.lft and A.rgt>=B.rgt ); 
+1
source

Source: https://habr.com/ru/post/1502792/


All Articles