Check for duplicates in the database and delete them

I have a table structured as follows:

table(A, B)

Both of them are primary keys, and they are necessary for joining two records in another table (i.e. they symbolize friendship between users).

I need to check the table and if (A, B) exists, delete the possible (B, A) (or vice versa).
Since the database is huge, I cannot do it manually for each individual record every time.

Of course, I programmed a script that populated the database to check this situation and avoid it, but we used this script on 8 different PCs, so different dumps can have "back-ups".

+3
source share
1 answer

The problem arose because the relationships you are trying to describe are symmetrical, but the circuit models an asymmetric relationship. The right to model a problem would be to maintain a relationship table - then there is a table linking users to relationships, for example.

relationship:
   id auto_increment

related:
   r_id foreign key references relationship.id
   u_id foreign key references user.id
   primary key (r_id, u_id)

But to clean up existing data ... an obvious approach would be ...

DELETE FROM yourtable d
WHERE A>B AND EXISTS (
    SELECT 1 
    FROM yourtable r
    WHERE r.A=d.B
    AND r.B =d.A
)

However, if I correctly recall that MySQL does not like to use a subselect in deletion that refers to the same table as delete. So....

SELECT d.A,d.B 
INTO dups
FROM yourtable d, yourtable r
WHERE d.A>d.B
AND r.A=d.B
AND r.B =d.A;

then ....

DELETE FROM yourtable
WHERE EXISTS (
 SELECT 1 FROM dups
 WHERE dups.A=yourtable.A
 AND dups.B=yourtable.B
)

Not sure if the pushed predicate will cause the problem anyway, so if that doesn't work ...

DELETE FROM yourtable
WHERE CONCAT(A, '/', B) IN (
 SELECT CONCAT(A, '/' B) FROM dups
)
+1
source

Source: https://habr.com/ru/post/1780415/


All Articles