Why is CTE (recursive) not parallel (MAXDOP = 8)?

We have a fairly large machine with a memory capacity of 100 GB + and 8+ cores. MAXDOP server = 8.

T_SEQ_FF rowcount = 61692209, size = 2991152 KB 

UPD 1: Table T_SEQ_FF has two indexes:

 1) create index idx_1 on T_SEQ_FF (first_num) 2) create index idx_2 on T_SEQ_FF (second_num) 

The T_SEQ_FF table has first_num , second_num pairs numbers that should contain the sequence after cte:

 ;with first_entity as ( select first_num from T_SEQ_FF a where not exists (select 1 from T_SEQ_FF b where a.first_num = b.second_num) ) , cte as ( select a.first_num, a.second_num, a.first_num as first_key, 1 as sequence_count from T_SEQ_FF a inner join first_entity b on a.first_num = b.first_num union all select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 from T_SEQ_FF a inner join cte on a.first_num = cte.second_num ) select * from cte option (maxrecursion 0); 

But when I run this query, I only see the plan of the serial request without Parallelism. If I remove the second part of the CTE from the request above:

 union all select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 from T_SEQ_FF a inner join cte on a.first_num = cte.second_num 

then I could see that the request plan becomes Parallel using Repartition and Gather Streams.

Therefore, I can summarize that this is related to recurisve . CTE SQL Server does not use Parallelism when processing this query.

I believe that on such a large machine with tons of free resources, Parallelism should help complete the request faster.

While it works for ~ 40-50 minutes.

Could you please advise to use as many resources as possible in order to complete the request faster?

CTE is the only option because we need to populate sequences from the first_num - second_num , and these sequences can be of any length.

+4
source share
3 answers

I would try rewriting CTE to remove one of the steps i.e.

 ;cte as ( select a.first_num, a.second_num, a.first_num as first_key, 1 as sequence_count from T_SEQ_FF a where not exists (select 1 from T_SEQ_FF b where a.first_num = b.second_num) union all select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1 from T_SEQ_FF a inner join cte on a.first_num = cte.second_num ) select * from cte option (maxrecursion 0); 

If there is only one root element, it would be better to pass this into the query as a variable so that the value can be used by the query optimizer.

Another thing to try is to modify the query to get the root elements without a subquery, i.e. second_num is null or first_num = second_num.

+1
source

I'm not sure if this option is viable, but we have already ruled out many other common approaches: can you do explicit parallelization by splitting the first_entity set into pieces, then running this query in parallel with the code and finally merging this data together.

This is more complicated than just a t-sql solution, and I don’t know if this will work on your data, both data distribution and blocking can occur here.

+1
source

I came across a similar problem, and after a thorough analysis of the situation, as well as the problem in UNION ALL Performance IN SQL Server 2005, it seems to me that the cte link in the UNION ALL query disables parallelization (most likely, this is an error).

0
source

Source: https://habr.com/ru/post/1300387/


All Articles