I have a customer database populated with normalized addresses. There are duplicates.
Each user created their own entry and entered their own address. Thus, we have a 1 to 1 relationship between users and addresses:
CREATE TABLE `users` ( `UserID` INT UNSIGNED NOT NULL AUTO_INCREMENT, `Name` VARCHAR(63), `Email` VARCHAR(63), `AddressID` INT UNSIGNED, PRIMARY KEY (`UserID`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `addresses` ( `AddressID` INT UNSIGNED NOT NULL AUTO_INCREMENT, `Duplicate` VARCHAR(1), `Address1` VARCHAR(63) DEFAULT NULL, `Address2` VARCHAR(63) DEFAULT NULL, `City` VARCHAR(63) DEFAULT NULL, `State` VARCHAR(2) DEFAULT NULL, `ZIP` VARCHAR(10) DEFAULT NULL, PRIMARY KEY (`AddressID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And the data:
INSERT INTO `users` VALUES (1, 'Michael', ' michael@email.com ', 1), (2, 'Steve', ' steve@email.com ', 2), (3, 'Judy', ' judy@email.com ', 3), (4, 'Kathy', ' kathy@email.com ', 4), (5, 'Mark', ' mark@email.com ', 5), (6, 'Robert', ' robert@email.com ', 6), (7, 'Susan', ' susan@email.com ', 7), (8, 'Paul', ' paul@email.com ', 8), (9, 'Patrick', ' patrick@email.com ', 9), (10, 'Mary', ' mary@email.com ', 10), (11, 'James', ' james@email.com ', 11), (12, 'Barbara', ' barbara@email.com ', 12), (13, 'Peter', ' peter@email.com ', 13); INSERT INTO `addresses` VALUES (1, '', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (2, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (3, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (4, '', '5678 Sycamore Lane', '', 'Upstate', 'NY', '50000'), (5, '', '1000 State Street', 'Apt C', 'Sunnydale', 'OH', '54321'), (6, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (7, 'Y', '1000 State Street', 'Apt C', 'Sunnydale', 'OH', '54321'), (8, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (9, '', '1000 State Street', 'Apt A', 'Sunnydale', 'OH', '54321'), (10, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'), (11, 'Y', '5678 Sycamore Lane', '', 'Upstate', 'NY', '50000'), (12, 'Y', '1000 Main Street', 'Apt A', 'Sunnydale', 'OH', '54321'), (13, '', '9999 Valleyview', '', 'Springfield', 'KS', '54321');
Oh yes, let me add that relation of foreign keys:
ALTER TABLE `users` ADD CONSTRAINT `AddressID` FOREIGN KEY `AddressID` (`AddressID`) REFERENCES `addresses` (`AddressID`);
We had a list of addresses cleared by a third-party service that normalized the data and indicated where we had duplicates. This is where the Duplicate column came from. If there is a "Y", this is a duplicate of another address. The primary address is NOT marked as duplicate, as shown in the sample data.
I obviously want to delete all duplicate entries, but there are user entries that point to them. I need them to point to a version of the address that is not a duplicate.
So how can I update the AddressID in users to match non-dual addresses?
The only way I can do this is to repeat all the data using a high-level language, but I am sure that MySQL has all the tools necessary to make something like this better.
Here is what I tried:
SELECT COUNT(*) as cnt, GROUP_CONCAT(AddressID ORDER BY AddressID) AS ids FROM addresses GROUP BY Address1, Address2, City, State, ZIP HAVING cnt > 1; +
From there, I could iterate over each line of the result and do this:
UPDATE `users` SET `AddressID` = 1 WHERE `AddressID` IN (2,3,6,8,10);
But there must be a better way only for MySQL, shouldn't it?
When all is said and done, the data MUST look like this:
SELECT * FROM `users`; +--------+---------+-------------------+-----------+ | UserID | Name | Email | AddressID | +--------+---------+-------------------+-----------+ | 1 | Michael | michael@email.com | 1 | | 2 | Steve | steve@email.com | 1 | | 3 | Judy | judy@email.com | 1 | | 4 | Kathy | kathy@email.com | 4 | | 5 | Mark | mark@email.com | 5 | | 6 | Robert | robert@email.com | 1 | | 7 | Susan | susan@email.com | 5 | | 8 | Paul | paul@email.com | 1 | | 9 | Patrick | patrick@email.com | 9 | | 10 | Mary | mary@email.com | 1 | | 11 | James | james@email.com | 4 | | 12 | Barbara | barbara@email.com | 1 | | 13 | Peter | peter@email.com | 13 | +--------+---------+-------------------+-----------+ 13 rows in set (0.00 sec) SELECT * FROM `addresses`; +-----------+-----------+--------------------+----------+-------------+-------+-------+ | AddressID | Duplicate | Address1 | Address2 | City | State | ZIP | +-----------+-----------+--------------------+----------+-------------+-------+-------+ | 1 | | 1234 Main Street | | Springfield | KS | 54321 | | 4 | | 5678 Sycamore Lane | | Upstate | NY | 50000 | | 5 | | 1000 State Street | Apt C | Sunnydale | OH | 54321 | | 9 | | 1000 State Street | Apt A | Sunnydale | OH | 54321 | | 13 | | 9999 Valleyview | | Springfield | KS | 54321 | +-----------+-----------+--------------------+----------+-------------+-------+-------+ 5 rows in set (0.00 sec)
reference