How to find rows in one table that do not have a matching row in another table

I have a 1: 1 ratio between two tables. I want to find all rows in table A that do not have the corresponding row in table B. I use this query:

SELECT id FROM tableA WHERE id NOT IN (SELECT id FROM tableB) ORDER BY id desc 

id is the primary key in both tables. Besides the primary key indexes, I also have an index on tableA (id desc).

Using H2 (embedded Java database), this results in a full scan of tableB. I want to avoid a full table scan.

How can I rewrite this query for quick start? Which index should I use?

+48
optimization sql h2
Sep 12 '09 at 16:04
source share
5 answers
 select tableA.id from tableA left outer join tableB on (tableA.id = tableB.id) where tableB.id is null order by tableA.id desc 

If your db knows how to cross index indexes, this will affect the primary key index

+72
Sep 12 '09 at 16:09
source share
β€” -

You can also use exists , because sometimes it is faster than left join . You will need to compare them to find out which one you want to use.

 select id from tableA a where not exists (select 1 from tableB b where b.id = a.id) 

To show that exists can be more efficient than left join , here are the plans for these queries in SQL Server 2008:

left join - total cost of the subtree: 1.09724:

left join

exists - total value of the subtree: 1.07421:

exists

+25
Sep 12 '09 at 16:15
source share

You must check each ID in table A for every identifier in table B. Full-featured RDBMS (for example, Oracle) will be able to optimize it in FULL FAST SCAN INDEX FULL, and not touch the table at all. I don't know if the H2 optimizer is as smart as that.

H2 supports MINUS syntax, so you should try this

 select id from tableA minus select id from tableB order by id desc 

It can work faster; it is definitely worth benchmarking.

+5
Sep 12 '09 at 16:18
source share

For my small dataset, Oracle provides almost all of these queries with the exact same plan that uses primary key indexes without touching the table. The exception is the MINUS version, which manages to make fewer agreed failures, despite the higher cost of the plan.

 --Create Sample Data. drop table tableA; drop table tableB; create table tableA as ( select rownum-1 ID, chr(rownum-1+70) bb, chr(rownum-1+100) cc from dual connect by rownum<=4 ); create table tableB as ( select rownum ID, chr(rownum+70) data1, chr(rownum+100) cc from dual UNION ALL select rownum+2 ID, chr(rownum+70) data1, chr(rownum+100) cc from dual connect by rownum<=3 ); alter table tableA Add Primary Key (ID); alter table tableB Add Primary Key (ID); --View Tables. select * from tableA; select * from tableB; --Find all rows in tableA that don't have a corresponding row in tableB. --Method 1. SELECT id FROM tableA WHERE id NOT IN (SELECT id FROM tableB) ORDER BY id DESC; --Method 2. SELECT tableA.id FROM tableA LEFT JOIN tableB ON (tableA.id = tableB.id) WHERE tableB.id IS NULL ORDER BY tableA.id DESC; --Method 3. SELECT id FROM tableA a WHERE NOT EXISTS (SELECT 1 FROM tableB b WHERE b.id = a.id) ORDER BY id DESC; --Method 4. SELECT id FROM tableA MINUS SELECT id FROM tableB ORDER BY id DESC; 
+4
Dec 10 2018-10-10
source share

I can’t say which of these methods will work best on H2 (or even if they all work), but I wrote an article that details all the (good) methods available in TSQL. You can give them a chance and see if any of them work:

http://code.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=QueryBasedUponAbsenceOfData&referringTitle=Home

+3
Sep 12 '09 at 16:17
source share



All Articles