UNION ALL Performance in SQL Server 2005

I have a query with a long CTE chain that ends in

SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryStreets UNION ALL SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryRegions 

The execution time of this request is 1450 ms. When I execute these 2 SELECTs, it takes much less time. For request

 SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryStreets 

the runtime is 106 ms. And for the request

 SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryRegions 

it's 20 ms.

Why does UNION ALL increase execution time by more than 10 times? What can I do to reduce it?

Thank you for your help.

UPDATED Entire request (I abbreviated it, but the problem is still present)

 WITH tFoundRegions AS ( SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants WHERE UserID = @UserID AND (indeces & 1) > 0 ), tFoundAreas AS ( SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants WHERE UserID = @UserID AND (indeces & 2) > 0 ), tFoundCities AS ( SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants WHERE UserID = @UserID AND (indeces & 4) > 0 ), tFoundSubCities AS ( SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants WHERE UserID = @UserID AND (indeces & 8) > 0 ), tFoundStreets AS ( SELECT KladrItemName FROM dbo.tBuiltKladrItemsWithQuants WHERE UserID = @UserID AND (indeces & 16) > 0 ), tDictionaryStreets AS ( SELECT DISTINCT CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE NULL END RegionName , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE NULL END AreaName , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE NULL END CityName , CASE WHEN SubCityName IN (SELECT KladrItemName FROM tFoundSubCities) THEN SubCityName ELSE NULL END SubCityName , StreetName FROM StreetNames WHERE StreetName IN (SELECT KladrItemName FROM tFoundStreets) ), tMissingSubCities AS ( SELECT KladrItemName FROM tFoundSubCities WHERE KladrItemName NOT IN (SELECT SubCityName FROM tDictionaryStreets) ), tDictionarySubCities AS ( SELECT DISTINCT CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE NULL END RegionName , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE NULL END AreaName , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE NULL END CityName , SubCityName , NULL StreetName FROM SubCityNames WHERE SubCityName IN (SELECT KladrItemName FROM tMissingSubCities) ) SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryStreets UNION ALL SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionarySubCities 
+4
source share
4 answers

Make sure you clear execution + data caches between each test run.

eg.

 DBCC FREEPROCCACHE DBCC DROPCLEANBUFFERS 

If you first start with UNION ALL and then run the two options separately, the data will already be cached in memory, which makes the performance much better (therefore it gives the false impression that the subsequent approach is faster if it cannot be).

If you used UNION, this might be slower as it should use DISTINCT, but UNION ALL should not do this, so it should not be different.

Update:
Look at the implementation plans and compare them - see if there is a difference. You can view the execution plan by clicking the “Enable Actual Execution Plan” button in SSMS before running the request

Update 2:
Based on the full CTE data, I think I will optimize them - I don't think UNION ALL is actually a problem.

IMHO, it’s best to try to work through the CTE one by one and try to optimize each separately so that when they are combined in the main request, they will perform better.

eg. for tDictionaryStreets, how about this:

 SELECT DISTINCT r.KladrItemName AS RegionName, a.KladrItemName AS AreaName, c.KladrItemName AS CityName, sc.KladrItemName AS SubCityName, s.StreetName FROM StreetNames s JOIN tFoundStreets fs ON s.StreetName = fs.KladrItemName LEFT JOIN tFoundRegions r ON s.RegionName = r.KladrItemName LEFT JOIN tFoundAreas a ON s.AreaName = a.KladrItemName LEFT JOIN tFoundCities c ON s.CityName = c.KladrItemName LEFT JOIN tFoundSubCities sc ON s.SubCityName = scc.KladrItemName 

KladrItemName in each table must contain at least an index. Try recycling tDictionarySubCities in the same way using joins.

+3
source

Could you compare the implementation plans? What's the difference? Union All should work fine, since there is no duplicate deletion (this requires sorting, which is expensive for large data sets).

0
source

May be network (unlikely) or memory. Depending on the number of rows, each result is returned. One way to check if this is a network or server is to include client statistics in SSMS (Query - enable client statistics - SHIFT-ALT-S). Below you can distinguish where most of the time is spent.

Could you compare the implementation plans? [...] lmsasu [...] When a query is executed quickly, it uses a "merge join" when slow is a "nested loop". [...]

I can’t comment, but what you see in the execution plan is the difference between “joining” two result sets (merging) and RBAR (pronouncing reebar - Row By Agonizing Row [Jeff Moden]), usually called a loop.

Merge Join: SQL finds two sets of results with a common reference and performs a set-based operation to combine the two sets. Nested loop: SQL cannot find a common reference and appends one row from set 1 to all rows from set 2 line by line and discards those that do not match.

It looks like SQL is encountering NULL results, which are unknown results. Try assigning a value similar to "XYZ" (or whatever is not known), which you can simply filter in the last query. This can avoid a nested loop in certain result sets, as values ​​are determined and not unknown. Similar to:

 [...] tDictionarySubCities AS ( SELECT DISTINCT CASE WHEN RegionName IN (SELECT KladrItemName FROM tFoundRegions) THEN RegionName ELSE 'XYZXYZ' END RegionName , CASE WHEN AreaName IN (SELECT KladrItemName FROM tFoundAreas) THEN AreaName ELSE 'XYZXYZ' END AreaName , CASE WHEN CityName IN (SELECT KladrItemName FROM tFoundCities) THEN CityName ELSE 'XYZXYZ' END CityName , SubCityName , NULL StreetName FROM SubCityNames WHERE SubCityName IN (SELECT KladrItemName FROM tMissingSubCities) ) SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionaryStreets WHERE RegionName <> 'XYZ' UNION ALL SELECT RegionName, AreaName, CityName, SubCityName, StreetName FROM tDictionarySubCities WHERE RegionName <> 'XYZ' 
0
source

I came across a similar problem, and after a careful analysis of the situation, it seems to me that using cte in a UNION ALL request disables parallelization (which, most likely, is an error).

In other words, UNION ALL will be equal to the sum of the two queries in which each of them was installed (maxdop 1).

While more testing is needed and it is actually difficult to make a request that will use parallelism to be able to check or even send as an error to Microsoft Connect, it’s still your problem, as well as the problem described in Why CTE (recurisve) is not parallilized (MAXDOP = 8)? also evidence that such a problem actually exists.

EDIT: I tested more widely, and although UNION ALL is parallelized many times, there are still situations where a without UNION ALL, which it parallelizes, but having UNION ALL, disables it.

Although this may be a mistake, it may also be due to the query optimizer not looking for a better plan, and instead looking for a good plan, and since the two queries connected to UNION already generate complex plans, as well as the query with CTE, it just might find a good plan before even considering the possibility of parallelization.

0
source

Source: https://habr.com/ru/post/1300384/


All Articles