MySQL: changing collation from utf8_bin to utf8_unicode_ce

Given the complete table, how to change the sorting from utf8_bin to utf8_unicode_ce? The usual "alter" request does not work due to "repeated input errors". For example, there are two entries

David Hussa 

and

 David Hußa 

I know that they are the same. Is there an elegant way to tell MySQL to "merge" records? I should mention that the record identifier is used in other tables as a reference, so MySQL must be respected too. Or do I need to do this in a long and annoying way: does it mean merging each duplicate manually and then changing the sort?

The table looks like this:

 delimiter $$ CREATE TABLE `authors` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(100) COLLATE utf8_bin NOT NULL, `count` int(11) NOT NULL DEFAULT '1', PRIMARY KEY (`id`), UNIQUE KEY `name_UNIQUE` (`name`), FULLTEXT KEY `name_FULLTEXT` (`name`) ) ENGINE=MyISAM AUTO_INCREMENT=930710 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Stores all authors from dblp.xml.'$$ 
+4
source share
1 answer

You can delete duplicate entries:

 DELETE a2 FROM authors a1 JOIN authors a2 ON a2.name COLLATE UTF8_GENERAL_CI = a1.name COLLATE UTF8_GENERAL_CI AND a2.id < a1.id 

Please note that this can be time consuming if your table is large.

It would be better to do this:

  • Remove the UNIQUE

  • Change the sort

  • Create a simple, not unique index on name

  • Run the query (without the COLLATE ):

     DELETE a2 FROM authors a1 JOIN authors a2 ON a2.name = a1.name AND a2.id < a1.id 
  • Drop the index

  • Restore UNIQUE .

To update link tables, run these queries before deleting entries:

 UPDATE child c JOIN ( ( SELECT name COLLATE utf8_unicode_ci AS name_ci, MAX(id) AS mid FROM authors GROUP BY name_ci ) pa JOIN authors a ON a.name COLLATE utf8_unicode_ci = name_ci ) ON c.author = a.id SET author = mid; 

in all reference tables.

+5
source

Source: https://habr.com/ru/post/1346258/


All Articles