Why is the query not using a specific index?

Question

Why is the query not using a specific index?

I have a table called Workflow. It has 37M lines. The ID (int) column has a primary key plus an additional column. The ID column is the first column in the index.

If I run the following query, PK will not be used (unless you use the index hint)

Select Distinct(SubID) From Workflow Where ID >= @LastSeenWorkflowID

If I execute this request instead, PK is used

 Select Distinct(SubID) From Workflow Where ID >= 786400000

I suspect the problem is using the parameter value in the request (what I need to do). I really don't want to use an index hint. Is there a workaround for this?

+6

sql-server tsql

Randy minder Jul 07 '11 at 19:10

source share

3 answers

Assuming your PC is an identifier OR is always greater than 0, perhaps you can try the following:

 Select Distinct(SubID) From Workflow Where ID >= @LastSeenWorkflowID And ID > 0

By adding a second condition, it can force the optimizer to use index search.

+2

G mastros Jul 08 '11 at 0:38

source share

This is a classic example of a local variable that creates a suboptimal plan.

You must use OPTION (RECOMPILE) to compile your query with the actual value of the ID parameter.

See my blog entry for more information: http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

0

Francois Jul 15 '11 at 15:44

source share

Remus Rusanu · Accepted Answer · 2011-07-08T15:05:26+0000

Please publish the execution plan, as well as the exact definition of the table, including all indexes.

When you use a variable, the optimizer does not know what selectivity the request will have, @LastSeenWorkflowID can filter everything except the very last few lines in Workflow, or it can include all of them. The generated plan should work in both situations. There is a threshold at which a range of searches by a clustered index becomes more expensive than a full scan by a non-clustered index, simply because the clustered index is much wider (it includes each column at the sheet level) and therefore has so many pages to repeat. The generated plan, which considers the value for @LastSeenWorkflowID to be unknown, probably crosses this threshold when evaluating the cost of searching for a clustered index and, as such, chooses to scan by a non-clustered index.

You can specify a narrow index specifically designed for this query:

 CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);

or

 CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);

Such an index is too good for your query, regardless of the value of @LastSeenWorkflowID.

Why is the query not using a specific index?

More articles: