Simplify DELETE Execution

Question

Simplify DELETE Execution

Original request

delete B from TABLE_BASE B , TABLE_INC I where B.ID = I.IDID and B.NUM = I.NUM;

Performanace statistics for the above query

 +-------------------+---------+-----------+ | Response Time | SumCPU | ImpactCPU | +-------------------+---------+-----------+ | 00:05:29.190000 | 2852 | 319672 | +-------------------+---------+-----------+

Optimized query 1

 DEL FROM TABLE_BASE WHERE (ID, NUM) IN (SELECT ID, NUM FROM TABLE_INC);

Statistics for the above query

 +-----------------+--------+-----------+ | QryRespTime | SumCPU | ImpactCPU | +-----------------+--------+-----------+ | 00:00:00.570000 | 15.42 | 49.92 | +-----------------+--------+-----------+

Optimized Query 2

 DELETE FROM TABLE_BASE B WHERE EXISTS (SELECT * FROM TABLE_INC I WHERE B.ID = I.ID AND B.NUM = I.NUM);

Statistics for the above query

 +-----------------+--------+-----------+ | QryRespTime | SumCPU | ImpactCPU | +-----------------+--------+-----------+ | 00:00:00.400000 | 11.96 | 44.93 | +-----------------+--------+-----------+

My question is

How / Why does optimized query 1 and 2 significantly affect performance?
What is the best practice for such DELETE queries?
Should I choose Query 1 or Query 2? Which one is perfect / better / more reliable? I feel that Query 1 will be perfect, because instead of SELECT * I use SELECT ID,NUM , cutting to only two columns, but Query 2 shows better results.

 QUERY 1 This query is optimized using type 2 profile T2_Linux64, profileid 21. 1) First, we lock TEMP_DB.TABLE_BASE for write on a reserved RowHash to prevent global deadlock. 2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we lock TEMP_DB.TABLE_BASE for write. 3) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TEMP_DB.TABLE_BASE by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by the hash code of ( TEMP_DB.TABLE_BASE.NUM, TEMP_DB.TABLE_BASE.ID) to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 168,480 rows ( 5,054,400 bytes). The estimated time for this step is 0.03 seconds. 2) We do an all-AMPs RETRIEVE step from TEMP_DB_T.TABLE_INC by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is redistributed by the hash code of ( TEMP_DB_T.TABLE_INC.NUM, TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then we do a SORT to order Spool 3 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 3 is estimated with high confidence to be 5,640 rows (310,200 bytes). The estimated time for this step is 0.03 seconds. 4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Spool 3 (Last Use) by way of an all-rows scan. Spool 2 and Spool 3 are joined using an inclusion merge join, with a join condition of ("(ID = ID) AND (NUM = NUM)"). The result goes into Spool 1 (all_amps), which is redistributed by the hash code of ( TEMP_DB.TABLE_BASE.ROWID) to all AMPs. Then we do a SORT to order Spool 1 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with no confidence to be 168,480 rows (3,032,640 bytes). The estimated time for this step is 1.32 seconds. 5) We do an all-AMPs MERGE DELETE to TEMP_DB.TABLE_BASE from Spool 1 (Last Use) via the row id. The size is estimated with no confidence to be 168,480 rows. The estimated time for this step is 42.95 seconds. 6) We spoil the parser dictionary cache for the table. 7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

 QUERY 2 EXPLAIN PLAN This query is optimized using type 2 profile T2_Linux64, profileid 21. 1) First, we lock TEMP_DB.TABLE_BASE for write on a reserved RowHash to prevent global deadlock. 2) Next, we lock TEMP_DB_T.TABLE_INC for access, and we lock TEMP_DB.TABLE_BASE for write. 3) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from TEMP_DB.TABLE_BASE by way of an all-rows scan with no residual conditions into Spool 2 (all_amps), which is redistributed by the hash code of ( TEMP_DB.TABLE_BASE.NUM, TEMP_DB.TABLE_BASE.ID) to all AMPs. Then we do a SORT to order Spool 2 by row hash. The size of Spool 2 is estimated with low confidence to be 168,480 rows ( 5,054,400 bytes). The estimated time for this step is 0.03 seconds. 2) We do an all-AMPs RETRIEVE step from TEMP_DB_T.TABLE_INC by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is redistributed by the hash code of ( TEMP_DB_T.TABLE_INC.NUM, TEMP_DB_T.TABLE_INC.ID) to all AMPs. Then we do a SORT to order Spool 3 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 3 is estimated with high confidence to be 5,640 rows (310,200 bytes). The estimated time for this step is 0.03 seconds. 4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to Spool 3 (Last Use) by way of an all-rows scan. Spool 2 and Spool 3 are joined using an inclusion merge join, with a join condition of ("(NUM = NUM) AND (ID = ID)"). The result goes into Spool 1 (all_amps), which is redistributed by the hash code of (TEMP_DB.TABLE_BASE.ROWID) to all AMPs. Then we do a SORT to order Spool 1 by row hash and the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with no confidence to be 168,480 rows (3,032,640 bytes). The estimated time for this step is 1.32 seconds. 5) We do an all-AMPs MERGE DELETE to TEMP_DB.TABLE_BASE from Spool 1 (Last Use) via the row id. The size is estimated with no confidence to be 168,480 rows. The estimated time for this step is 42.95 seconds. 6) We spoil the parser dictionary cache for the table. 7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

For TABLE_BASE

 +----------------+----------+ | table_bytes | skewness | +----------------+----------+ | 16842085888.00 | 22.78 | +----------------+----------+

For TABLE_INC

 +-------------+----------+ | table_bytes | skewness | +-------------+----------+ | 5317120.00 | 44.52 | +-------------+----------+

+5

performance sql teradata

Pirate x Nov 16 '16 at 8:12

source share

1 answer

dnoeth · Answer 1 · 2016-11-16T09:23:11+0000

What is the relationship between TABLE_BASE and TABLE_INC ?

If it is a one-to-many Q1, it probably creates a huge coil, while Q2 & 3 can apply DISTINCT before connecting.

Regarding IN vs. EXISTS should not make any difference, have you checked dbc.QryLogStepsV?

Edit:

If (ID,Num) is a PI rewrite of the target table in MERGE DELETE, it should provide better performance:

 MERGE INTO TABLE_BASE AS tgt USING TABLE_INC AS src ON src.ID = tgt.ID, AND src.Num = tgt.Num WHEN MATCHED THE DELETE

Simplify DELETE Execution

More articles: