Very slow subqueries when using "NOT IN"

I am working on creating reports for the data contained in a large existing Access database (~ 500 mb after compactness and repair), and I am having problems with a slow subquery.

The database has a large table that contains a record of each customer’s purchase. Here is a simple query that finds customers who bought a blue widget. It completes in a few seconds and returns about ten thousand records.

SELECT DISTINCT CustomerId FROM ProductSales WHERE Product = 'BLUE' 

Here is a query that is trying to find customers who bought a blue widget but not a red widget. It takes about an hour.

 SELECT DISTINCT CustomerId FROM ProductSales WHERE Product = 'BLUE' AND CustomerId NOT IN ( SELECT CustomerId FROM ProductSales WHERE Product = 'RED' ) 

Is there a way to reorganize the second request so that it takes several minutes instead of an hour?

+6
source share
2 answers
Database engine

Access cannot use an index for Not In , so it should be slow. With the CustomerId index, this query should be much faster, because the db engine can use the index.

 SELECT DISTINCT blue.CustomerId FROM ProductSales AS blue LEFT JOIN ( SELECT CustomerId FROM ProductSales WHERE Product = 'RED' ) AS red ON blue.CustomerId = red.CustomerId WHERE blue.Product = 'BLUE' AND red.CustomerId Is Null; 

You might also want to try the Not Exists approach, but using an index there is not guaranteed. Also, see the comment below by David Fenton, which discusses the impact of performance in more detail.

+10
source

Add an index, of course, if you don't have one. If this is a problem, it is probably simple that there are many customers with orders for something other than RED, but not so many with BLUE; this (unverified) request is trying to fix this.

 SELECT DISTINCT CustomerId FROM ProductSales LEFT JOIN ( SELECT DISTINCT CustomerId cid FROM ProductSales LEFT JOIN ( SELECT DISTINCT CustomerId FROM ProductSales WHERE Product = 'BLUE' ) foo ON CustomerId = cid WHERE Product = 'RED' ) bar USING (CustomerId) WHERE cid IS NULL 
0
source

Source: https://habr.com/ru/post/894658/


All Articles