How to make this SQL query using IN (with many numeric identifiers) more efficient?

Question

How to make this SQL query using IN (with many numeric identifiers) more efficient?

I've been waiting for this request for more than an hour, so I know that I'm probably doing something wrong. Is there an efficient way to customize this query :?

select RespondentID, MIN(SessionID) as 'SID' from BIG_Sessions (nolock) where RespondentID in ( 1418283, 1419863, 1421188, 1422101, 1431384, 1435526, 1437284, 1441394, /* etc etc THOUSANDS */ 1579244 ) and EntryDate between '07-11-2011' and '07-31-2012' GROUP BY RespondentID

I know that my date range is quite large, but I can’t change this part (dates are spreading all over the place).

Also, the reason for MIN(SessionID) is that otherwise we get a lot of SessionID for each respondent, and enough (it takes MIN on an alphanumeric identifier like ach2a23a-adhsdx123 ... and gets the first alphabet)

thanks

+6

sql sql-server group-by query-optimization

Coffee Jul 31 '12 at 10:38

source share

2 answers

If the smallest and largest identifiers in an IN statement are known earlier and depending on the number of identifiers in the table, then adding respondedID > [smallest_known_id-1] AND respondedID < [largest_known_id+1] to the IN statement will help limit the problem

+1

gts Jul 31 '12 at 23:25

source share

podiluska · Accepted Answer · 2012-07-31T22:50:27+0000

Put your thousands of numbers in a temporary table.
Specify a number field in this table.
Specify the RespondentID field in BIG_SESSIONS
The inclusion of two tables

eg:

 select RespondentID, MIN(SessionID) as 'SID' from BIG_Sessions (nolock) inner join RespondentsFilterTable on BIG_SESSIONS.RespondentID = RespondentsFilterTable.RespondentID where EntryDate between '07-11-2011' and '07-31-2012' GROUP BY BIG_Sessions.RespondentID

You can also add indexes to EntryDate and SessionID, but if you often add to big_sessions, this may be a counter product in other places

In general, you can get hints about how query performance can be improved by looking at estimated (or, if possible, actual) execution plans.

How to make this SQL query using IN (with many numeric identifiers) more efficient?

More articles: