Delete duplicate records without creating a temporary table

Question

Delete duplicate records without creating a temporary table

I have a table with many duplicate entries:

shop ID tax_id 1 10 1 10 1 11 2 10 2 12 2 10 2 10

I want to delete all duplicate records without creating a temporary table. After the update request, the table should look like this:

 shop ID tax_id 1 10 1 11 2 10 2 12

+2

php mysql

Lina Dec 21 '11 at 12:42

source share

5 answers

Working solution.

 //Sql query to find duplicates SELECT id, tax_id, count(*) - 1 AS cnt FROM shop GROUP BY id HAVING cnt > 1 --- res +------+--------+-----+ | id | tax_id | cnt | +------+--------+-----+ | 1 | 10 | 2 | | 2 | 10 | 3 | +------+--------+-----+ //Iterate through results with your language of choice DELETE FROM shop WHERE id=<res id> AND tax_id=<res tax_id> LIMIT <cnt - 1> ---res (iterated) +------+--------+ | id | tax_id | +------+--------+ | 1 | 10 | | 1 | 11 | | 2 | 12 | | 2 | 10 | +------+--------+

For two requests, you need a small part of php to complete the uninstall

 $res = mysql_query("SELECT id, tax_id, count(*) - 1 AS cnt FROM shop GROUP BY id HAVING cnt > 1") while($row = mysql_fetch_assoc($res)){ mysql_query("DELETE FROM shop WHERE id=".$row['id']." AND tax_id=". $row['tax_id']." LIMIT ".$row['cnt'] -1 . "); }

Edit: Recently, this has repeated itself, for what it's worth, an alternative solution using a temporary column, eliminating the need for a scripting language.

 ALTER TABLE shop ADD COLUMN place INT; SET @i = 1 UPDATE shop SET place = @i:= @i + 1; DELETE FROM shop WHERE place NOT IN (SELECT place FROM items GROUP BY id, tax_id); ALTER TABLE shop DROP COLUMN place;

+5

CBusBus Dec 21 '11 at 13:56

source share

First, you can prevent this by creating a unique index for these two fields for future reference.

As for the solution, create a new shopnew table with the same structure in mysql or just delete each record from the table when creating the List record (make sure you have a backup!):

 //Get every record from mysql $sSQL = "Select ID, tax_id from shop"; $oRes = mysql_query($sSQL); $aRecordList = array(); while($aRow = mysql_fetch_assoc($oRes)){ //If record is a duplicate, it will be 'overwritten' $aRecordList[$aRow['id'].".".$aRow['tax_id']] =1; } //You could delete every record from shop here, if you dont want an additional table //recordList now only contains unique records foreach($aRecordList as $sRecord=>$bSet){ $aExpRecord = explode(".",$sRecord); mysql_query("INSERT INTO shopnew set id=".$aExpRecord[0].", tax_id = ".$aExpRecord[1] }

+3

Derk arts Dec 21 '11 at 12:51

source share

Perhaps this may help:

 $query="SELECT * FROM shop ORDER BY id"; $rez=$dbh->query($query); $multi=$rez->fetchAll(PDO::FETCH_ASSOC); foreach ($multi as $key=>$row){ $rest=array_slice($multi,$key+1); foreach ($rest as $rest){ if(($row['id']==$rest['id']) && ($row['tax_id']==$rest['tax_id'])){ $dbh->query("DELETE FROM shop WHERE id={$rest['id']} and tax_id= {$rest['tax_id']}"); } }

}

The first foreach iterates over each line, and the second a comparison. I use PDO, but of course you can do it procedurally.

+3

neso-72 Oct 11 '12 at 13:03

source share

In fact, the issue with its current limitations is a rather difficult task. I have been thinking about the solution all evening (realizing that the solution will never be useful). I would not use the solution in the wild, I just tried to figure out whether it is possible to do this only using MySQL.

The question is in my wording: is it possible to write a series of DELETE statements that delete duplicate rows from a table with two columns without unique restrictions?

Problems:

Lines
do not have an identification key or primary key, so you should think of a way to refer to one line, which should remain
we need to somehow group the lines, that is, apply the order and then the condition, but the DELETE form, which supports ORDER BY , can only have a WHERE and does not support HAVING . This order applies after the condition is met.
we don’t need to sort the rows if the values are ordered using the cluster primary key, but we don’t have it.

Suppose we have a table:

 CREATE TABLE `tablename` ( `a_id` int(10) unsigned NOT NULL, `b_id` int(10) unsigned NOT NULL, KEY `Index_1` (`a_id`,`b_id`) ) ENGINE=InnoDB COLLATE utf8_bin;

I added a key (not UNIQUE or PRIMARY) to speed up the search and hope to use it in groups.

You can copy a table with some values:

 INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4); INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4); INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4);

As a side effect, the key has become a coverage index, and when we make SELECT from the table, the displayed values are sorted, but when we make exceptions, the values are read in the order in which we insert them.

Now consider the following query:

 SELECT @c, @a_id as a, @b_id as b, a_id, b_id FROM tablename, (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) >= 1 ;

And its result:

 @c, a, b, a_id, b_id 1, 1, 1, 1, 1 2, 1, 1, 1, 1 3, 1, 1, 1, 1 1, 1, 4, 1, 4 2, 1, 4, 1, 4 3, 1, 4, 1, 4 1, 2, 2, 2, 2 2, 2, 2, 2, 2 3, 2, 2, 2, 2 1, 2, 3, 2, 3 2, 2, 3, 2, 3 3, 2, 3, 2, 3

Results are automatically sorted using Index_1 , and duplicate pairs (a_id, b_id) are listed in the @c column. Now our task is to delete all the lines where @c > 1 . The only problem we have is getting MySQL to use Index_1 when deleting, which is quite difficult without additional conditions. But we can do this using an equality check or several equality checks on a_id :

 DELETE FROM t USING tablename t FORCE INDEX (Index_1) JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE a_id IN (1) AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1; DELETE FROM t USING tablename t FORCE INDEX (Index_1) JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE a_id IN (2) AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1; SELECT * FROM tablename t; a_id, b_id 1, 1 1, 4 2, 2 2, 3

I cannot put all possible a_id into IN() , because MySQL will understand that the index is useless in this case, and the query will not delete all duplicates (only adjacent ones), but, say, 10 different a_id I can delete duplicates in two DELETE statements , each IN will have 5 explicit identifiers.

Hope this can be useful to someone =)

+2

newtover Dec 21 '11 at 23:10

source share

Sergio Tulentsev · Accepted Answer · 2011-12-21T12:58:47+0000

Here's an internal solution (but not a single line)

Find max id:

 select max(id) as maxid from shop;

Remember this value. Let them say that it is equal to 1000;

Insert unique values with an offset:

 insert into shop (id, tax_id) select distinct id + 1000, tax_id from shop;

Discard old values:

 delete from shop where id <= 1000;

Restore normal identifiers:

 update shop set id = id - 1000;

PROFIT!

Delete duplicate records without creating a temporary table

More articles: