MYSQL removes duplicates more efficiently?

Question

MYSQL removes duplicates more efficiently?

Although this issue has been raised in the past, I am curious if this is still the best way to clear duplicate entries in a large (3M and growing) table. After each bulk insert, I run this line to keep order in order, but it starts to take a very long time.

Duplicate rows can only be defined through 3 columns. Others either automatically increase, have unique identifiers, sources, etc.

Here’s what’s happening for me now -

DELETE n1 FROM main n1, main n2 WHERE n1.id < n2.id AND n1.col1 = n2.col1 AND n1.col2 = n2.col2 AND n1.col3 = n2.col3

If I could speed it up, or is it as good as it turns out?

Thanks for any help / understanding!

+4

mysql

user1145643 Oct 31 '12 at 10:38

source share

2 answers

Agree with other posters - you can add UNIQUE KEY to duplicate links.

If you want to remove duplicates, you can use this query -

 DELETE t1 FROM main t1 JOIN (SELECT MIN(id) id, col1, col2, col3 FROM main GROUP BY col1, col2, col3) t2 ON t1.id <> t2.id AND t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3;

+1

Devart Nov 01 '12 at 7:23

source share

Nesim raazon · Accepted Answer · 2012-10-31T22:52:43+0000

Add a unique index to your table in columns col1, col2 and col2 like this.

 ALTER TABLE `main` ADD UNIQUE INDEX `col1_col2_col3` (`col1`, `col2`, `col3`);

And this will prevent duplicate rows from being inserted into your table.

For example: After you enter these values,

 INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111);

You cannot insert this, you will get duplicate row error

 INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111);

With the right unique indexes, you don’t have to worry about duplicate entries later.

MYSQL removes duplicates more efficiently?

More articles: