Fastest Get Duplicate SQL Script

What is an example of fast SQL for getting duplicates in datasets with hundreds of thousands of records. Usually I use something like:

SELECT afield1, afield2 FROM afile a WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1); 

But it is rather slow.

+41
performance sql scripting duplicates
Oct 13 '08 at 9:36
source share
5 answers

This is a more direct way:

 select afield1,count(afield1) from atable group by afield1 having count(afield1) > 1 
+74
Oct 13 '08 at 9:38
source share

You can try:

 select afield1, afield2 from afile a where afield1 in ( select afield1 from afile group by afield1 having count(*) > 1 ); 
+15
Oct 13 '08 at 9:39
source share

A similar question was asked last week. There are good answers there.

SQL to search for duplicate records (within a group)

In this question, OP was interested in all the columns (fields) in the table (file), but the rows belonged to the same group if they had the same key value (afield1).

There are three types of answers:

subqueries in the where clause, like some other answers here.

inner join between table and groups treated as a table (my answer)

and analytic queries (something new for me).

+5
Oct 13 '08 at 12:50
source share

By the way, if someone wants to remove duplicates, I used this:

 delete from MyTable where MyTableID in ( select max(MyTableID) from MyTable group by Thing1, Thing2, Thing3 having count(*) > 1 ) 
+5
Jan 20 2018-11-11T00:
source share

This should be fast enough (even faster if dupeFields indexes are indexed).

 SELECT DISTINCT a.id, a.dupeField1, a.dupeField2 FROM TableX a JOIN TableX b ON a.dupeField1 = b.dupeField2 AND a.dupeField2 = b.dupeField2 AND a.id != b.id 

I assume that the only drawback of this query is that since you are not executing COUNT(*) , you cannot check the number of repetitions, only that it appears more than once.

+3
Aug 21 2018-12-12T00:
source share



All Articles