Why does assigning a request counter to a variable perform better than checking it directly?

I recently had experience tuning performance, and I want to share it here and try to understand why this improvement happened.

In one of my processes, I wanted to return a dataset based on the existence of some other records.

My request:

IF (SELECT COUNT(1) FROM ...) > 0 SELECT โ€ฆ 

This request took about 5 seconds.

I made a change and assigned the output of the IF variable, and then checked it.

 DECLARE @cnt INT = 0 SELECT @cnt = COUNT(1) FROM โ€ฆ IF @cnt > 0 SELECT โ€ฆ 

It takes less than 1 second to run.

I also tried IF EXISTS , but got the same result before the improvements (5 seconds).

I am very interested in why the compiler behaves so strongly, and if there is any specific answer for this.

thanks

+5
source share
1 answer

There are two parts to this.

1) SQL Server Optimizer Converts

 IF (SELECT COUNT(1) FROM ...) > 0 SELECT โ€ฆ 

in

 IF EXISTS(SELECT 1 FROM ...) SELECT โ€ฆ 

I saw that this was pointed out by Adam Machanich in his commentary on Andrew Kelly's Exists Vs. post. Count (*) - the battle never ends :

It is interesting to note that in SQL Server 2005, if there is an index available for search resolution, the COUNT (*)> 0 test will be tested and behave the same as EXISTS.

Adam provided a demo.


2) Sometimes EXISTS worse than COUNT :

If EXISTS takes longer than the built-in select statement

Check existence with EXISTS superior to COUNT! ... Not?

As Paul White wrote:

Using EXISTS introduces the purpose of the line, where the optimizer creates an execution plan designed to quickly find the first line. At the same time, he assumes that the data is evenly distributed. For example, if statistics show that 100 expected matches are in 100,000 lines, then suppose he only needs to read 1,000 lines to find the first match.

This will increase the wait time than expected if this assumption is erroneous. For example, if SQL Server chooses an access method (for example, random scan) that detects the first match of a value very late in the search, this can lead to an almost complete check. On the other hand, if a matching line occurs with can be found among the first few rows, the performance will be very good. This is the main risk with row goals - inconsistent performance.


If your data distribution is distorted, or if you expect that in most cases COUNT will be zero (i.e. you should still scan the entire table to get an answer), then you should try to get a plan without a row goal (i.e. without EXISTS ).

One obvious way you found is to store the COUNT results in a variable.

+3
source

Source: https://habr.com/ru/post/1241094/


All Articles