Using 'where then Union' or Use 'Union then Where'

You have these two types of queries:

--query1 Select someFields From someTables Where someWhereClues Union all Select someFields FROM some Tables Where someWhereClues --query2 Select * FROM ( Select someFields From someTables Union all Select someFields FROM someTables ) DT Where someMixedWhereClues 

Note:
In both queries, the final result fields are the same.

I thought 1st. the query is faster or its performance is better!
But after some research, I confused this note :

SQL Server (as a sample RDBMS) first reads the whole data, and then searches for records. => so that both queries will record and search for all records.

Please help me with my misunderstandings and whether there are any other differences between query1 and query2?


Edit : adding sample plans:

 select t.Name, t.type from sys.tables t where t.type = 'U' union all select t.Name, t.type from sys.objects t where t.type = 'U' select * from ( select t.Name, t.type from sys.tables t union all select t.Name, t.type from sys.objects t ) dt where dt.type = 'U' 

Implementation Plans: enter image description hereenter image description here

both are the same and 50%

+6
source share
6 answers

SQL Server Query Optimizer optimizes both queries to get almost the same performance.

+5
source

The first cannot be slower. Here is the reasoning:

  • If the WHERE clauses in the first case can efficiently use INDEX , fewer rows will be collected for collection together in UNION . Less lines -> faster.
  • The second one does not have INDEX on UNION , therefore WHERE cannot be optimized this way.

This is what may cause the former to be slower. But I consider them as exceptions, not the rule.

  • A different amount of parallelism has been achieved.
  • At the time of starting the requests, different files are cached.

Caution: I assume that all three WHERE clauses are identical (as your example shows).

+1
source

As a rule, I will always consider the first type of request.

In made samples and queries with simple WHERE predicates, both will use the same plan. But in a more complex query, with more complex predicates, the optimizer may not find an equally effective solution for the second type of query (it is just an optimizer and is bound by resource and time constraints). The more complex the query, the less likely it is that the optimizer will find a better execution plan (since this will lead to a timeout and choose the worst plan found so far). And it gets even worse if predicates have ORed.

+1
source

In my practice, the first option has never been slower than the second. I believe that the optimizer is smart enough to optimize these plans more or less the same. However, I did some tests, and the first option was always better. For instance:

 CREATE TABLE #a ( a INT, b INT ); WITH Numbers ( I ) AS ( SELECT 1000 UNION ALL SELECT I + 1 FROM Numbers WHERE I < 5000 ) INSERT INTO #a ( a ) SELECT I FROM Numbers ORDER BY CRYPT_GEN_RANDOM(4) OPTION ( MAXRECURSION 0 ); WITH Numbers ( I ) AS ( SELECT 1000 UNION ALL SELECT I + 1 FROM Numbers WHERE I < 5000 ) INSERT INTO #a ( b ) SELECT I FROM Numbers ORDER BY CRYPT_GEN_RANDOM(4) OPTION ( MAXRECURSION 0 ); SELECT a, b FROM #a WHERE a IS NOT NULL UNION ALL SELECT a, b FROM #a WHERE b IS NOT NULL SELECT * FROM ( SELECT a, b FROM #a UNION ALL SELECT a, b FROM #a ) c WHERE a IS NOT NULL OR b IS NOT NULL 

The result is 47% against 53%

0
source

In my experience, there is no direct answer to this question, and it depends on the nature of the basic request. As you have shown, the optimizer has the same execution plan in both of these scenarios, but this is not always the case. Performance is usually similar, but sometimes performance can vary greatly depending on the request. In general, I only look more closely at this when performance is poor for a good reason.

0
source

SQLServer optimizes both of these queries to the same one, as shown in the execution plans that you posted. He can do this because in this case the queries are quite simple; otherwise it may turn out differently. While you are composing a query, you should try to follow the same general rules as the optimizer, and filter as soon as possible to limit the returned result set. By announcing that you only want to get the ā€œUā€ records first, and then combine these results, you will prepare a query for subsequent revisions that may invalidate the choice of optimizer that leads to the same execution plan.

In short, you don't need to make simple queries be optimal, but it's a good habit to have, and this will help when creating more complex queries.

0
source

Source: https://habr.com/ru/post/983910/


All Articles