Delete duplicate records without creating a temporary table

I have a table with many duplicate entries:

shop ID tax_id 1 10 1 10 1 11 2 10 2 12 2 10 2 10 

I want to delete all duplicate records without creating a temporary table. After the update request, the table should look like this:

 shop ID tax_id 1 10 1 11 2 10 2 12 
+2
source share
5 answers

Here's an internal solution (but not a single line)

Find max id:

 select max(id) as maxid from shop; 

Remember this value. Let them say that it is equal to 1000;

Insert unique values โ€‹โ€‹with an offset:

 insert into shop (id, tax_id) select distinct id + 1000, tax_id from shop; 

Discard old values:

 delete from shop where id <= 1000; 

Restore normal identifiers:

 update shop set id = id - 1000; 

PROFIT!

+5
source

Working solution.

 //Sql query to find duplicates SELECT id, tax_id, count(*) - 1 AS cnt FROM shop GROUP BY id HAVING cnt > 1 --- res +------+--------+-----+ | id | tax_id | cnt | +------+--------+-----+ | 1 | 10 | 2 | | 2 | 10 | 3 | +------+--------+-----+ //Iterate through results with your language of choice DELETE FROM shop WHERE id=<res id> AND tax_id=<res tax_id> LIMIT <cnt - 1> ---res (iterated) +------+--------+ | id | tax_id | +------+--------+ | 1 | 10 | | 1 | 11 | | 2 | 12 | | 2 | 10 | +------+--------+ 

For two requests, you need a small part of php to complete the uninstall

 $res = mysql_query("SELECT id, tax_id, count(*) - 1 AS cnt FROM shop GROUP BY id HAVING cnt > 1") while($row = mysql_fetch_assoc($res)){ mysql_query("DELETE FROM shop WHERE id=".$row['id']." AND tax_id=". $row['tax_id']." LIMIT ".$row['cnt'] -1 . "); } 

Edit: Recently, this has repeated itself, for what it's worth, an alternative solution using a temporary column, eliminating the need for a scripting language.

 ALTER TABLE shop ADD COLUMN place INT; SET @i = 1 UPDATE shop SET place = @i:= @i + 1; DELETE FROM shop WHERE place NOT IN (SELECT place FROM items GROUP BY id, tax_id); ALTER TABLE shop DROP COLUMN place; 
+5
source

First, you can prevent this by creating a unique index for these two fields for future reference.

As for the solution, create a new shopnew table with the same structure in mysql or just delete each record from the table when creating the List record (make sure you have a backup!):

 //Get every record from mysql $sSQL = "Select ID, tax_id from shop"; $oRes = mysql_query($sSQL); $aRecordList = array(); while($aRow = mysql_fetch_assoc($oRes)){ //If record is a duplicate, it will be 'overwritten' $aRecordList[$aRow['id'].".".$aRow['tax_id']] =1; } //You could delete every record from shop here, if you dont want an additional table //recordList now only contains unique records foreach($aRecordList as $sRecord=>$bSet){ $aExpRecord = explode(".",$sRecord); mysql_query("INSERT INTO shopnew set id=".$aExpRecord[0].", tax_id = ".$aExpRecord[1] } 
+3
source

Perhaps this may help:

 $query="SELECT * FROM shop ORDER BY id"; $rez=$dbh->query($query); $multi=$rez->fetchAll(PDO::FETCH_ASSOC); foreach ($multi as $key=>$row){ $rest=array_slice($multi,$key+1); foreach ($rest as $rest){ if(($row['id']==$rest['id']) && ($row['tax_id']==$rest['tax_id'])){ $dbh->query("DELETE FROM shop WHERE id={$rest['id']} and tax_id= {$rest['tax_id']}"); } } 

}

The first foreach iterates over each line, and the second a comparison. I use PDO, but of course you can do it procedurally.

+3
source

In fact, the issue with its current limitations is a rather difficult task. I have been thinking about the solution all evening (realizing that the solution will never be useful). I would not use the solution in the wild, I just tried to figure out whether it is possible to do this only using MySQL.

The question is in my wording: is it possible to write a series of DELETE statements that delete duplicate rows from a table with two columns without unique restrictions?

Problems:

  • Lines
  • do not have an identification key or primary key, so you should think of a way to refer to one line, which should remain
  • we need to somehow group the lines, that is, apply the order and then the condition, but the DELETE form, which supports ORDER BY , can only have a WHERE and does not support HAVING . This order applies after the condition is met.
  • we donโ€™t need to sort the rows if the values โ€‹โ€‹are ordered using the cluster primary key, but we donโ€™t have it.

Suppose we have a table:

 CREATE TABLE `tablename` ( `a_id` int(10) unsigned NOT NULL, `b_id` int(10) unsigned NOT NULL, KEY `Index_1` (`a_id`,`b_id`) ) ENGINE=InnoDB COLLATE utf8_bin; 

I added a key (not UNIQUE or PRIMARY) to speed up the search and hope to use it in groups.

You can copy a table with some values:

 INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4); INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4); INSERT INTO tablename (a_id, b_id) VALUES (2, 3), (1, 1), (2, 2), (1,4); 

As a side effect, the key has become a coverage index, and when we make SELECT from the table, the displayed values โ€‹โ€‹are sorted, but when we make exceptions, the values โ€‹โ€‹are read in the order in which we insert them.

Now consider the following query:

 SELECT @c, @a_id as a, @b_id as b, a_id, b_id FROM tablename, (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) >= 1 ; 

And its result:

 @c, a, b, a_id, b_id 1, 1, 1, 1, 1 2, 1, 1, 1, 1 3, 1, 1, 1, 1 1, 1, 4, 1, 4 2, 1, 4, 1, 4 3, 1, 4, 1, 4 1, 2, 2, 2, 2 2, 2, 2, 2, 2 3, 2, 2, 2, 2 1, 2, 3, 2, 3 2, 2, 3, 2, 3 3, 2, 3, 2, 3 

Results are automatically sorted using Index_1 , and duplicate pairs (a_id, b_id) are listed in the @c column. Now our task is to delete all the lines where @c > 1 . The only problem we have is getting MySQL to use Index_1 when deleting, which is quite difficult without additional conditions. But we can do this using an equality check or several equality checks on a_id :

 DELETE FROM t USING tablename t FORCE INDEX (Index_1) JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE a_id IN (1) AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1; DELETE FROM t USING tablename t FORCE INDEX (Index_1) JOIN (SELECT @a_id:=0, @b_id:=0, @c:=0) as init WHERE a_id IN (2) AND (@c:=IF(LEAST(@a_id=(@a_id:=a_id), @b_id=(@b_id:=b_id)), @c+1, 1)) > 1; SELECT * FROM tablename t; a_id, b_id 1, 1 1, 4 2, 2 2, 3 

I cannot put all possible a_id into IN() , because MySQL will understand that the index is useless in this case, and the query will not delete all duplicates (only adjacent ones), but, say, 10 different a_id I can delete duplicates in two DELETE statements , each IN will have 5 explicit identifiers.

Hope this can be useful to someone =)

+2
source

Source: https://habr.com/ru/post/1387441/


All Articles