MYSQL Deduplicate and delete the duplicate row with the smallest data

I am working on a MYSQL database, which has the following three columns: emails, first name, last name.

What I need to do is deduplicate emails, where I know that I can use a function like this (this request is simple to sort not delete):

select distinct emails, name, surname from emails; 

or

 select emails, name, surname from emails group by emails having count(*) >= 2; 

However, I also need to make sure that if a duplicate email address is stored in the list, it has a first and / or last name.

For instance:

| id | Email and NBSP; | name | last name | | 1 | bob@bob.com | bean | Paulson |
| 2 | bob@bob.com ; |

In this case, I would like to save the first result and delete the second.

I studied the use of case or if statements, but I have not experienced such problems. I tried to extend the above functions using these instructions, but to no avail.

Can someone point me in the right direction?

PS: The first column in the table is an auto-increment value that helps

UPDATE 1: So far, @Bohemian's answer works fine, but fails in one case when there is a duplicate email address, where on one line it has a name but no last name, and on the next line it does not have a name but has surname. It will store both records. All you need to edit is to delete one of these two entries, no matter what.

UPDATE 2: @Bohemian the answer is great, but after more testing, I found that it has a fundamental flaw in that it only works when there is a duplicate email line in which the first and last name fields have data (for example, the first entries in table above). If there are duplicate e-mails, but none of the lines fill in the first and last name fields, then all these lines will be ignored and not deduplicated.

The final step for this query would be to figure out how to remove duplicates that do not meet current prerequisites. If one line has only a first name and the other just a last name, it really does not matter what will be deleted, since the letter is an important thing.

+4
source share
3 answers

You can use this DELETE query, which is general and can be easily adapted to support more fields:

 DELETE tablename.* FROM tablename LEFT JOIN ( SELECT MIN(id) min_id FROM tablename t INNER JOIN ( SELECT emails, MAX((name IS NOT NULL) + (surname IS NOT NULL)) max_non_nulls FROM tablename GROUP BY emails) m ON t.emails=m.emails AND ((t.name IS NOT NULL) + (t.surname IS NOT NULL))=m.max_non_nulls GROUP BY t.emails) ids ON tablename.id=ids.min_id WHERE ids.min_id IS NULL 

See the fiddle here .

This query returns the maximum number of nonzero fields for each message:

 SELECT emails, MAX((name IS NOT NULL) + (surname IS NOT NULL)) max_non_nulls FROM tablename GROUP BY emails 

Then I join this query with tablename to get the minimum identifier for each email with the maximum number of non-zero fields:

 SELECT MIN(id) min_id FROM tablename t INNER JOIN ( SELECT emails, MAX((name IS NOT NULL) + (surname IS NOT NULL)) max_non_nulls FROM tablename GROUP BY emails) m ON t.emails=m.emails AND ((t.name IS NOT NULL) + (t.surname IS NOT NULL))=m.max_non_nulls GROUP BY t.emails 

and then I delete all rows that have an identifier that is not returned by this request.

+3
source

This is easy with the mysql syntax to delete multiple tables :

 delete b from mytable a join mytable b on a.email = b.email and a.id != b.id where a.name is not null and a.surname is not null 
+4
source

Delete double email id entry

 delete from duplicate_email where id in( select id from ( select id, email from duplicate_email group by email having count(id) > 1) as id ) 

but there is one problem, you can delete this entry that has only one duplicate email. Two identical email addresses, but if there are three or more, you can repeat this request until you get a null entry.

0
source

Source: https://habr.com/ru/post/1494679/


All Articles