Aggregation of one column based on the individual value of another column

This scenario is based on a circuit in yet another question , and I am not interested in discussing the correctness of the circuit!

I am interested to know if there are any good methods in SQL Server to aggregate one column (below amount1 ) based on the separate value of another column ( id1 ).

Plan1 below twice scans table1, performs two aggregations with p_id, then combines the result. It seems that this can be improved. In some cases, query 2 may return the wrong result, but the plan is still worse!

Any ideas?

DDL

 IF OBJECT_ID('tempdb..#table1') IS NOT NULL DROP TABLE #table1; IF OBJECT_ID('tempdb..#table2') IS NOT NULL DROP TABLE #table2; CREATE TABLE #table1 (id1 int primary key nonclustered, amount1 int, p_id int); CREATE CLUSTERED INDEX ix ON #table1 (p_id,id1); INSERT INTO #table1 SELECT 1,500,10 UNION ALL SELECT 2,700,20 UNION ALL SELECT 3,500,10 UNION ALL SELECT 4,450,20 UNION ALL SELECT 5,300,10; CREATE TABLE #table2 (id2 int primary key, amount2 int, id1 int); INSERT INTO #table2 SELECT 1,300,1 UNION ALL SELECT 2,200,1 UNION ALL SELECT 3,200,2 UNION ALL SELECT 4,500,2 UNION ALL SELECT 5,400,3 UNION ALL SELECT 6,150,4 UNION ALL SELECT 7,300,4 UNION ALL SELECT 8,300,5; 

Request 1

 WITH t1 AS (SELECT p_id,SUM(amount1) AS total1 FROM #table1 GROUP BY p_id), t2 AS (SELECT p_id,SUM(amount2) AS total2 FROM #table2 table2 JOIN #table1 table1 ON table1.id1 = table2.id1 GROUP BY p_id) SELECT t1.p_id,total1,total2 FROM t1 JOIN t2 ON t1.p_id = t2.p_id 

Plan 1

Execution plan 1

Request 2

 SELECT table1.p_id, FLOOR(SUM(DISTINCT amount1 + table1.id1/100000000.0)) AS total1, SUM(amount2) AS total2 FROM #table1 table1 JOIN #table2 table2 ON table1.id1=table2.id1 GROUP BY table1.p_id 

Plan 2

Execution plan 1

+4
source share
2 answers

This one scans each record in one table only once:

 SELECT p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2 FROM #table1 t1 CROSS APPLY ( SELECT SUM(amount2) AS s_amount2 FROM #table2 t2 WHERE t2.id1 = t1.id1 ) t2 GROUP BY p_id |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END, [Expr1007]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END)) |--Stream Aggregate(GROUP BY:([t1].[p_id]) DEFINE:([Expr1026]=COUNT_BIG([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1027]=SUM([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1028]=COUNT_BIG([Expr1005]), [Expr1029]=SUM([Expr1005]))) |--Nested Loops(Left Outer Join, OUTER REFERENCES:([t1].[id1])) |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table1] AS [t1]), ORDERED FORWARD) |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END)) |--Stream Aggregate(DEFINE:([Expr1024]=COUNT_BIG([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2]), [Expr1025]=SUM([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2]))) |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table2] AS [t2]), WHERE:([tempdb].[dbo].[#table2].[id1] as [t2].[id1]=[tempdb].[dbo].[#table1].[id1] as [t1].[id1])) 

although this is not necessarily more effective.

This:

 SELECT p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2 FROM #table1 t1 JOIN ( SELECT id1, SUM(amount2) AS s_amount2 FROM #table2 GROUP BY id1 ) t2 ON t2.id1 = t1.id1 GROUP BY p_id 

will do the same with a lot of options for connections, but in the plan you can use an additional coil if t2 is chosen as the lead.

+2
source

Well, @Quassnoi's solution seems pretty good. In any case, for SQL Server 2005+, you can use the PARTITION BY to try to make a simpler query, but the execution plan is no better, although this does not necessarily mean more or less efficient.

 SELECT A.p_id, MIN(amount1) total1, SUM(amount2) total2 FROM (SELECT p_id, id1, SUM(amount1) OVER(PARTITION BY p_id) amount1 FROM #table1) A JOIN #table2 B ON A.id1 = B.id1 GROUP BY A.p_id 
+2
source

Source: https://habr.com/ru/post/1333809/


All Articles