Slow query with unexpected index scan

I have this query:

SELECT * FROM sample INNER JOIN test ON sample.sample_number = test.sample_number INNER JOIN result ON test.test_number = result.test_number WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00' 

The largest table here is RESULT, containing 11.1M records. Left 2 tables are about 1M.

This query is slow (over 10 minutes) and returns about 800 records. the execution plan displays a clustered scan of indexes (above it is PRIMARY KEY (result.result_number, which is not actually involved in the query)) for all 11M records. RESULT.TEST_NUMBER is the cluster primary key.

if I change 2010-03-17 09:00 - 2010-03-17 10:00 - I get about 40 entries. It runs for 300 ms. and the plan shows an index search ( indexed by result.test_number)

if I replaced * in the SELECT clause with result.test_number (with index) - then everything will be fast in the first case. this indicates hdd I / O problems, but does not specify plan changes.

So any ideas?

UPDATE: sampled_date is in the sample table and is covered by the index. other fields from this query: test.sample_number is also covered by index and result.test_number.

UPDATE 2: it is obvious that the sql server for some reason does not want to use the index.

I did a little experiment: I delete the INNER JOIN with the result, select all test.test_number and after that do

 SELECT * FROM RESULT WHERE TEST_NUMBER IN (...) 

it certainly works fast. but I can’t understand what the difference is and why the query optimizer chooses such an inappropriate way to select data in the first case.

UPDATE 3: after backing up the database and restoring the database with a new name - both queries work quickly, as expected, even on much wider ranges ...

So, are there any special commands for cleaning or optimization, whatever that means ?: - (

+4
source share
3 answers

A few things to try:

  • Update statistics
  • Add hints to the query about which index to use (in SQL Server you can say WITH (INDEX(myindex)) after specifying the table)

EDIT: You noted that copying the database made it work, which tells me that the index statistics are out of date. You can update them regularly with the UPDATE STATISTICS mytable .

Use EXEC sp_updatestats to update the entire database.

+7
source

The first thing I would like to do is to indicate the exact columns that I want and see if the problems go away. I doubt that you will need all the columns from all three tables.

It looks like he's having trouble getting all the rows from the result table. How big is the line? See how large all the data in the table is and divide them by the number of rows. Right-click on the table -> Properties ..., Storage tab.

Try putting the where clause in a subquery to get it to do this first?

 SELECT * FROM (SELECT * FROM sample WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00') s INNER JOIN test ON s.sample_number = test.sample_number INNER JOIN result ON test.test_number = result.test_number 

OR it may work better if you expect a small number of samples.

 SELECT * FROM sample INNER JOIN test ON sample.sample_number = test.sample_number INNER JOIN result ON test.test_number = result.test_number WHERE sample.sample_ID in ( SELECT sample_ID FROM sample WHERE sampled_date BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00' ) 
0
source

If you do SELECT * , you want all the data from the table. The data for the table is in the cluster index β€” the leaf nodes of the cluster index are data pages.

So, if you want all these data pages anyway, and since you are joining 1 million. rows of up to 11 million rows (1 out of 11 are not very selective for SQL Server), using an index to search for rows, and then search by bookmarks on the actual data pages for each of the rows found may just not be very effective, and therefore Instead This SQL Server uses a clustered index scan.

So, a short short story: select only those lines that you really need! Thus, you provide SQL Server with the ability to use the index, search there and find the necessary data.

If you select only three, four columns, then the likelihood that SQL Server will find and use the index that contains these columns is much higher than if you requested all the data from all the tables involved.

Another option is to try to find a way to express the subquery, using, for example, Common Table Expression, which would capture data from two smaller tables and reduce this number of rows even more, and unfortunately join the small result against the main table. If you have a small result set of 40 or 800 results (instead of two tables with 1 million rows each), then SQL Server may be more likely to use a clustered index search and do a bookmark search of 40 or 800 rows rather than doing a full scan cluster index.

0
source

Source: https://habr.com/ru/post/1304419/


All Articles