MYSQL removes duplicates more efficiently?

Although this issue has been raised in the past, I am curious if this is still the best way to clear duplicate entries in a large (3M and growing) table. After each bulk insert, I run this line to keep order in order, but it starts to take a very long time.

Duplicate rows can only be defined through 3 columns. Others either automatically increase, have unique identifiers, sources, etc.

Hereโ€™s whatโ€™s happening for me now -

DELETE n1 FROM main n1, main n2 WHERE n1.id < n2.id AND n1.col1 = n2.col1 AND n1.col2 = n2.col2 AND n1.col3 = n2.col3 

If I could speed it up, or is it as good as it turns out?

Thanks for any help / understanding!

+4
source share
2 answers

Add a unique index to your table in columns col1, col2 and col2 like this.

 ALTER TABLE `main` ADD UNIQUE INDEX `col1_col2_col3` (`col1`, `col2`, `col3`); 

And this will prevent duplicate rows from being inserted into your table.

For example: After you enter these values,

 INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111); 

You cannot insert this, you will get duplicate row error

 INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111); 

With the right unique indexes, you donโ€™t have to worry about duplicate entries later.

+2
source

Agree with other posters - you can add UNIQUE KEY to duplicate links.

If you want to remove duplicates, you can use this query -

 DELETE t1 FROM main t1 JOIN (SELECT MIN(id) id, col1, col2, col3 FROM main GROUP BY col1, col2, col3) t2 ON t1.id <> t2.id AND t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3; 
+1
source

Source: https://habr.com/ru/post/1443332/


All Articles