How to make this SQL query using IN (with many numeric identifiers) more efficient?

I've been waiting for this request for more than an hour, so I know that I'm probably doing something wrong. Is there an efficient way to customize this query :?

select RespondentID, MIN(SessionID) as 'SID' from BIG_Sessions (nolock) where RespondentID in ( 1418283, 1419863, 1421188, 1422101, 1431384, 1435526, 1437284, 1441394, /* etc etc THOUSANDS */ 1579244 ) and EntryDate between '07-11-2011' and '07-31-2012' GROUP BY RespondentID 

I know that my date range is quite large, but I can’t change this part (dates are spreading all over the place).

Also, the reason for MIN(SessionID) is that otherwise we get a lot of SessionID for each respondent, and enough (it takes MIN on an alphanumeric identifier like ach2a23a-adhsdx123 ... and gets the first alphabet)

thanks

+6
source share
2 answers
  • Put your thousands of numbers in a temporary table.
  • Specify a number field in this table.
  • Specify the RespondentID field in BIG_SESSIONS
  • The inclusion of two tables

eg:

 select RespondentID, MIN(SessionID) as 'SID' from BIG_Sessions (nolock) inner join RespondentsFilterTable on BIG_SESSIONS.RespondentID = RespondentsFilterTable.RespondentID where EntryDate between '07-11-2011' and '07-31-2012' GROUP BY BIG_Sessions.RespondentID 

You can also add indexes to EntryDate and SessionID, but if you often add to big_sessions, this may be a counter product in other places

In general, you can get hints about how query performance can be improved by looking at estimated (or, if possible, actual) execution plans.

+6
source

If the smallest and largest identifiers in an IN statement are known earlier and depending on the number of identifiers in the table, then adding respondedID > [smallest_known_id-1] AND respondedID < [largest_known_id+1] to the IN statement will help limit the problem

+1
source

Source: https://habr.com/ru/post/921807/


All Articles