TSQL - recursive CTE is inefficient - need an alternative

Here is a table with sample data:

DECLARE @TestTable TABLE ( ItemID INT, A INT, B INT, Month INT) INSERT INTO @TestTable VALUES (1234, 5, 9, 1) INSERT INTO @TestTable VALUES (1234, 6, 9, 2) INSERT INTO @TestTable VALUES (4321, 5, 11, 1) INSERT INTO @TestTable VALUES (4321, 12, 11, 2) INSERT INTO @TestTable VALUES (1324, 14, 6, 1) INSERT INTO @TestTable VALUES (1324, 5, 6, 2) INSERT INTO @TestTable VALUES (1234, 1, 9, 3) INSERT INTO @TestTable VALUES (1324, 9, 6, 3) 

It should be noted that column B is always the same as it is used only once in this calculation, but is necessary for the initial calculation.

I am trying to subtract B from A in the first line, and then in the next lines, subtract the previous line difference from A. In fact, B - A = C on the first, and then C - A on all subsequent lines for the RELATED ItemID.

Here are the results that I expect:

 ItemID ABC Month RowNumber 1234 5 9 4 1 1 1234 6 9 -2 2 2 1234 1 9 -3 3 3 1324 14 6 -8 1 1 1324 5 6 -13 2 2 1324 9 6 -22 3 3 4321 5 11 6 1 1 4321 12 11 -6 2 2 

This is how I do it.

 ;WITH CTE_TestValue AS ( SELECT Main.ItemID, Main.A, Main.B, Main.Month, ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber FROM @TestTable AS Main ), CTE_TestColumnC AS ( SELECT MainA.ItemID, MainA.A, MainA.B, (MainA.B - MainA.A) AS C, MainA.Month, MainA.RowNumber FROM CTE_TestValue AS MainA WHERE MainA.Rownumber = 1 UNION ALL SELECT MainB.ItemID, MainB.A, MainB.B, (Sub.C - MainB.A) AS C, MainB.Month, MainB.RowNumber FROM CTE_TestValue AS MainB INNER JOIN CTE_TestColumnC AS Sub ON MainB.RowNumber - 1 = Sub.RowNumber AND MainB.ItemID = Sub.ItemID -- CROSS JOIN CTE_TestColumnC AS Sub -- WHERE Sub.RowNumber + 1 = MainB.RowNumber -- AND MainB.ItemID = Sub.ItemID ) SELECT Main.ItemID, Main.A, Main.B, Main.C, Main.Month, Main.RowNumber FROM CTE_TestColumnC AS Main ORDER BY ItemID, Month, RowNumber 

This works fine with a small sample of data, but I am dealing with 20,000 ItemId, each of which is repeated 10 times. It finishes all calculations of the first line instantly, as expected, and then the calculation time increases DRASTICALLY.

As you can see, I tried both INNER JOIN and CROSS JOIN . I believe that they have the same execution plan with the parameters that I gave CROSS JOIN .

Is there a more efficient / efficient way to accomplish this?

I let it work for 5 hours yesterday to see if it ended. This is not true.

Another note: when I use this in test data, I SELECT WITHOUT using ORDER to hopefully help speed things up. ORDER is for my convenience only when I verify a fact.

+4
source share
2 answers

Your problem is that you are using CTE as a source of recursive CTE. Your first CTE will be executed once for each iteration of your recursive CTE. With your test data, this means that CTE_TestValue is created 8 times.

Put the result of CTE_TestValue in a temporary table with a clustered primary key on (RowNumber, ItemID) and use this temporary table as a data source for the recursive CTE CTE_TestColumnC .

Also change the join condition in the recursive part to ON MainB.RowNumber = Sub.RowNumber + 1 . This will force the query to use the index in the temporary table.

 DECLARE @TestTable TABLE ( ItemID INT, A INT, B INT, Month INT) INSERT INTO @TestTable VALUES (1234, 5, 9, 1) INSERT INTO @TestTable VALUES (1234, 6, 9, 2) INSERT INTO @TestTable VALUES (4321, 5, 11, 1) INSERT INTO @TestTable VALUES (4321, 12, 11, 2) INSERT INTO @TestTable VALUES (1324, 14, 6, 1) INSERT INTO @TestTable VALUES (1324, 5, 6, 2) INSERT INTO @TestTable VALUES (1234, 1, 9, 3) INSERT INTO @TestTable VALUES (1324, 9, 6, 3) CREATE TABLE #TestValue ( ItemID INT, A INT, B INT, Month INT, RowNumber INT, primary key(RowNumber, ItemID) ) INSERT INTO #TestValue SELECT Main.ItemID, Main.A, Main.B, Main.Month, ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber FROM @TestTable AS Main ;WITH CTE_TestColumnC AS ( SELECT MainA.ItemID, MainA.A, MainA.B, (MainA.B - MainA.A) AS C, MainA.Month, MainA.RowNumber FROM #TestValue AS MainA WHERE MainA.Rownumber = 1 UNION ALL SELECT MainB.ItemID, MainB.A, MainB.B, (Sub.C - MainB.A) AS C, MainB.Month, MainB.RowNumber FROM #TestValue AS MainB INNER JOIN CTE_TestColumnC AS Sub ON MainB.RowNumber = Sub.RowNumber + 1 AND MainB.ItemID = Sub.ItemID ) SELECT Main.ItemID, Main.A, Main.B, Main.C, Main.Month, Main.RowNumber FROM CTE_TestColumnC AS Main ORDER BY ItemID, Month, RowNumber DROP TABLE #TestValue 

In the query plan for your query, the problem is displayed in the table scan in the lower right corner. this test data is executed 8 times with the total number of rows returned:

enter image description here

Query plans for a query with a temporary table: enter image description hereenter image description here

+5
source

I hope I correctly understood what you are trying to do.
Here is my solution:

 WITH DATA AS ( SELECT *, row_number() over (ORDER BY itemid) RN FROM TestTable), RECURSIVE AS ( SELECT itemID, BA AS C, RN FROM DATA WHERE RN = 1 UNION ALL SELECT T1.itemID, t2.C - t1.A, t1.RN FROM DATA AS T1 INNER JOIN RECURSIVE AS T2 ON t1.RN = T2.Rn+1) SELECT ItemID, C FROM RECURSIVE 

You can find the full example (with your data) here

0
source

Source: https://habr.com/ru/post/1439002/


All Articles