Filter or join first?

I initially simply wrote a query to find out the annual total order number for one customer in excess of 1. In 1.query, I filtered out the result set and connected it to another result set that recognized the name of the client. It is curious that, in my opinion, the filter would at first provide better performance, as this would require fewer results. So I wrote a second query to join first, and then a filter that looks more neat than the first query. The result is the same as I expect, because all the time the result is lower. But I'm not sure what time is most important? Or is this case just coincidental? How to think about performance?

use [AdventureWorks2012] set statistics time on; --1.filter first,join second select tempC.*,tempP.FirstName,tempP.LastName from (select Year(OrderDate) As OrderYear,CustomerID,count(CustomerID) As CustomerOrderAmt from Sales.SalesOrderHeader group by Year(OrderDate),CustomerID having count(CustomerID) >1 ) as tempC join( select p.FirstName,p.LastName,c.CustomerID from Person.Person as p join Sales.Customer as c on c.PersonID=p.BusinessEntityID ) as tempP on tempC.CustomerID=tempP.CustomerID order by tempC.OrderYear,tempC.CustomerID GO --2.join first,filter second select Year(so.OrderDate) As Orderdate,so.CustomerID,count(so.CustomerID) As CustomerOrderAmt,p.FirstName,p.LastName from Sales.SalesOrderHeader as so join Sales.Customer as C on so.CustomerID=c.CustomerID join Person.Person as p on c.PersonID=p.BusinessEntityID group by Year(so.OrderDate),so.CustomerID,p.FirstName,p.LastName having count(so.CustomerID)>1 go 
+6
source share
2 answers

The query optimizer can choose to perform actions in any order that creates the same logical result, so even if you try to filter first and then join, if you do not force it using the temp table or table, the optimizer can join then filter.

If you really think that the optimizer is doing something stupid, you can try things like a var or temp table, but what seems stupid cannot really be for reasons that are becoming quite advanced.

However, sometimes how you write the query will influence what the optimizer does, so you should mostly look at the execution plans. If they are the same, use the clearest code. If they do not test and do not test again, and go with what seems best.

+6
source

I find it good practice to use subqueries to reduce the total number of join operations and the number of columns in a GROUP BY block. So I’ll tell you right away that the first request is definitely more efficient.

Inquiries

 SELECT t.OrderYear , t.CustomerID , t.CustomerOrderAmt , p.FirstName , p.LastName FROM ( SELECT OrderYear = YEAR(OrderDate) , CustomerID , CustomerOrderAmt = COUNT(CustomerID) FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate) , CustomerID HAVING COUNT(CustomerID) > 1 ) t JOIN ( SELECT p.FirstName , p.LastName , c.CustomerID FROM Person.Person p JOIN Sales.Customer c ON c.PersonID = p.BusinessEntityID ) p ON t.CustomerID = p.CustomerID ORDER BY t.OrderYear , t.CustomerID 

vs

 SELECT Orderdate = YEAR(so.OrderDate) , so.CustomerID , CustomerOrderAmt = COUNT(so.CustomerID) , FirstName = MAX(p.FirstName) , LastName = MAX(p.LastName) FROM Sales.SalesOrderHeader so JOIN Sales.Customer c ON so.CustomerID = c.CustomerID JOIN Person.Person p ON c.PersonID = p.BusinessEntityID GROUP BY YEAR(so.OrderDate) , so.CustomerID HAVING COUNT(so.CustomerID) > 1 

Request Cost:

Query cost

Lead time:

 -- first query SQL Server Execution Times: CPU time = 94 ms, elapsed time = 395 ms. -- second query SQL Server Execution Times: CPU time = 140 ms, elapsed time = 480 ms. 
+2
source

Source: https://habr.com/ru/post/947798/


All Articles