Mysql relational database duplicated with different keys

I have been trying to fix relational db for a month, but I cannot find an effective solution.

This is my problem: I have 534 M lines of Relational Db with lots of foreig (30) keys.

I can handle normal duplicates with a union ... group by ... havin count (*) = 1 by inserting, but there are also duplicates with different keys.

Example:

Table 1

id | key1 | value 1 | 11 | a1 2 | 22 | a1 

table 2

 key1 | value 11 | a2 22 | a2 Foreign key table1(key1) references table2(key1) 

I try to find, remove duplicate, fix parents. I tried 3 different ways,

1: PHP Script, arrays

export tables (dump) -> array_unique, find duplicates, fix parent array -> import tables

Its pretty fast, but it needs 80 GB of memory, which may be a problem in the future

2: PHP Script, SQL Query

tables exporrt (dump) -> find duplicates -> send queries to the parent table

There is no need for memory, but the tables are really large and 5 queries take 1 second, 50 M duplicates will take days, months, years

3: Enabling DUPLICATE UPDATE KEY: I added one duplicate column to store duplicate keys, and I defined all columns except the key as a unique key,
insert ... to the duplicate update concat (duplicate, ';', VALUES (key)).

But some tables have more than 1 key, and sometimes I have to define 24 columns as a unique index and memory problem again

I hope I can explain my problem. Do you have any ideas?

+4
source share
1 answer

Why don't you just create a unique column. Just use the Ignore keyword to remove duplicate entries.
So your query would look something like this:
ALTER IGNORE TABLE testdb . table1 ADD A UNIQUE INDEX column1 ( column1 ASC);

0
source

Source: https://habr.com/ru/post/1433420/


All Articles