How can I speed up queries that are looking for a transitive close root node?

I have a historical transitive closure table that represents a tree.

create table TRANSITIVE_CLOSURE ( CHILD_NODE_ID number not null enable, ANCESTOR_NODE_ID number not null enable, DISTANCE number not null enable, FROM_DATE date not null enable, TO_DATE date not null enable, constraint TRANSITIVE_CLOSURE_PK unique (CHILD_NODE_ID, ANCESTOR_NODE_ID, DISTANCE, FROM_DATE, TO_DATE) ); 

Here are some sample data:

 CHILD_NODE_ID | ANCESTOR_NODE_ID | DISTANCE -------------------------------------------- 1 | 1 | 0 2 | 1 | 1 2 | 2 | 0 3 | 1 | 2 3 | 2 | 1 3 | 3 | 0 

Unfortunately, my current query to search for the root of the node causes a full table scan:

 select * from transitive_closure tc where distance = 0 and not exists ( select null from transitive_closure tci where tc.child_node_id = tci.child_node_id and tci.distance <> 0 ); 

At first glance, this does not look too expensive, but as you approach 1 million rows, this particular query starts to become unpleasant ... especially when it is part of a view that spans the adjacency tree for old support.

Is there a better way to find the root node of a transitive closure? I would like to rewrite all of our old code for legacy code, but I can't ... so I need to somehow create an adjacency list. Getting everything except the root node is easy, so is there a better way? Am I thinking about this problem wrong?

Query plan for a table with 800k rows.

 OPERATION OBJECT_NAME OPTIONS COST SELECT STATEMENT 2301 HASH JOIN RIGHT ANTI 2301 Access Predicates TC.CHILD_NODE_ID=TCI.CHILD_NODE_ID TABLE ACCESS TRANSITIVE_CLOSURE FULL 961 Filter Predicates TCI.DISTANCE = 1 TABLE ACCESS TRANSITIVE_CLOSURE FULL 962 Filter Predicates DISTANCE=0 
+4
source share
3 answers

How long does a query take to execute and how long do you want to execute it? (Normally, you donโ€™t want to use the cost for customization. Very few people know what the cost of an explanation plan means.)

On my slow desktop, the query took only 1.5 seconds for 800K lines. And 0.5 seconds after the data was in memory. Are you getting something significantly worse, or will this request be executed very often?

I donโ€™t know what your data looks like, but I would suggest that a full table scan will always be the best for this query. Assuming your hierarchical data is relatively shallow, that is, there are many distances between 0 and 1, but very few distances of 100, the most important column will not be very different. This means that any of the index entries for the distance will indicate a large number of blocks. It would be much cheaper to immediately read the entire table using multi-block readings than to read a large number of it one block at a time.

Also, what do you mean by historical? Can you save the results of this query in a materialized form?

Another possible idea is to use analytic functions. This replaces the second table scan with a sort. This approach is usually faster, but for me this request actually takes longer, 5.5 seconds instead of 1.5. But perhaps it will be better in your environment.

 select * from ( select max(case when distance <> 0 then 1 else 0 end) over (partition by child_node_id) has_non_zero_distance ,transitive_closure.* from transitive_closure ) where distance = 0 and has_non_zero_distance = 0; 
+2
source

Can you try adding an index at a distance and child_node_id or reordering this column in an existing unique index ? I think that then the external query will have access to the table by index over a distance, while the internal query only needs access to the index.

0
source

Add ONE root node from which all current root nodes descend. Then you just ask the children of your one root. The problem is resolved.

0
source

Source: https://habr.com/ru/post/1343224/


All Articles