Why does SQL Server use index validation instead of index lookup when the WHERE clause contains parameterized values

We found that SQL Server uses index scanning instead of index search if the where clause contains parameterized values ​​instead of a string literal.

The following is an example:

SQL Server performs an index check in the following case (parameters in the where clause)

 declare @val1 nvarchar(40), @val2 nvarchar(40); set @val1 = 'val1'; set @val2 = 'val2'; select min(id) from scor_inv_binaries where col1 in (@val1, @val2) group by col1 

On the other hand, the following query searches for an index:

 select min(id) from scor_inv_binaries where col1 in ('val1', 'val2') group by col1 

Is there any one similar behavior and how they fixed it to ensure that the query searches the index instead of scanning the index?

We cannot use the forceeek table hint because forceeek is supported on SQL Sserver 2005.

I also updated the statistics. Thank you so much for your help.

+3
source share
4 answers

Well, to answer your question why SQL Server does this, the answer is that the query is not compiled in a logical order, each statement is compiled in its own worth, therefore, when forming the query plan for the select statement, the optimizer does not know that @ val1 and @ Val2 will become "val1" and "val2, respectively.

When SQL Server does not know the value, it should better understand how many times this variable appears in the table, which can sometimes lead to suboptimal plans. I want to say that the same query with different values ​​can generate different plans. Imagine this simple example:

 IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL DROP TABLE #T; CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL); INSERT #T (Val) SELECT TOP 991 1 FROM sys.all_objects a UNION ALL SELECT TOP 9 ROW_NUMBER() OVER(ORDER BY a.object_id) + 1 FROM sys.all_objects a; CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val); 

All I did here is create a simple table and add 1000 rows with values ​​1-10 for the val column, however 1 appears 991 times and the remaining 9 appear only once. The package is as follows:

 SELECT COUNT(Filler) FROM #T WHERE Val = 1; 

It would be more efficient to scan the entire table than use the index to search, and then do 991 bookmark searches to get the value for Filler , however with only one line the following query:

 SELECT COUNT(Filler) FROM #T WHERE Val = 2; 

will be more efficient to search the index, and search on one tab to get the value for Filler (and doing these two queries will ratify it)

I'm pretty sure that cut-off search and bookmark search do indeed vary depending on the situation, but it is pretty low. Using an example table with a small amount of trial and error, I found that I needed a val column to have 38 rows with a value of 2 before the optimizer went to a full table scan to search the index and search for bookmarks:

 IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL DROP TABLE #T; DECLARE @I INT = 38; CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL); INSERT #T (Val) SELECT TOP (991 - @i) 1 FROM sys.all_objects a UNION ALL SELECT TOP (@i) 2 FROM sys.all_objects a UNION ALL SELECT TOP 8 ROW_NUMBER() OVER(ORDER BY a.object_id) + 2 FROM sys.all_objects a; CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val); SELECT COUNT(Filler), COUNT(*) FROM #T WHERE Val = 2; 

So, for this example, the limit is 3.7% of matching rows.

Since the query does not know how many rows will match when you use the variable that it should guess, and the easiest way is to find out the total number of rows and divide it by the total number of different values ​​in the column, so in this example the estimated number of rows for WHERE val = @Val is 1000/10 = 100. The actual algorithm is more complex than that, but for example, it will be done. Therefore, when we look at the execution plan for:

 DECLARE @i INT = 2; SELECT COUNT(Filler) FROM #T WHERE Val = @i; 

enter image description here

We can see here (with the source data) that the estimated number of rows is 100, but the actual rows are 1. In the previous steps, we know that with more than 38 rows, the optimizer will select a cluster index scan over the index search, so since The best guess for the number of rows above this, the plan for an unknown variable is a cluster index scan.

Just for further proof of the theory, if we create a table with 1000 rows of numbers 1-27 uniformly distributed (so the estimated row counter will be approximately 1000/27 = 37.037)

 IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL DROP TABLE #T; CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL); INSERT #T (Val) SELECT TOP 27 ROW_NUMBER() OVER(ORDER BY a.object_id) FROM sys.all_objects a; INSERT #T (val) SELECT TOP 973 t1.Val FROM #T AS t1 CROSS JOIN #T AS t2 CROSS JOIN #T AS t3 ORDER BY t2.Val, t3.Val; CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val); 

Then run the query again, we get the plan with the index query:

 DECLARE @i INT = 2; SELECT COUNT(Filler) FROM #T WHERE Val = @i; 

enter image description here

So, hopefully, this pretty fully explains why you get this plan. Now I assume that the next question is how you are imposing another plan, and the answer is to use the OPTION (RECOMPILE) query hint OPTION (RECOMPILE) to force the query to compile at run time when the parameter value is known. Returning to the original data, where the best plan for Val = 2 is a search, but using a variable, a plan with indexing is issued, we can run:

 DECLARE @i INT = 2; SELECT COUNT(Filler) FROM #T WHERE Val = @i; GO DECLARE @i INT = 2; SELECT COUNT(Filler) FROM #T WHERE Val = @i OPTION (RECOMPILE); 

enter image description here

We see that the latter uses index search and key search, because it checked the value of the variable at run time, and the most suitable plan for this particular value is selected. The problem with OPTION (RECOMPILE) is that you cannot take advantage of cached query plans, so you need to compile the query every time.

+13
source

Try

 declare @val1 nvarchar(40), @val2 nvarchar(40); set @val1 = 'val1'; set @val2 = 'val2'; select min(id) from scor_inv_binaries where col1 in (@val1, @val2) group by col1 OPTION (RECOMPILE) 
0
source

What data type is col1?

Your variables are nvarchar, while your literals are varchar / char; if col1 is varchar / char, it can perform an index scan to implicitly cast each value in col1 to nvarchar for comparison.

0
source

I assume that the first query uses a predicate and the second query uses a search predicate.

Seek Predicate is an operation that describes part of a B-tree tree. A predicate is an operation that describes an additional filter using non-key columns. Based on the description, it is very clear that Seek Predicate is better than Predicate because it searches for indexes, while Predicate searches in non-key columns, which means that the search is on the data in the page files themselves.

For more information, visit: - https://social.msdn.microsoft.com/Forums/sqlserver/en-US/36a176c8-005e-4a7d-afc2-68071f33987a/predicate-and-seek-predicate

0
source

Source: https://habr.com/ru/post/1243857/


All Articles