How to fix UTF8 double encoded characters (in utf-8 table)

The previous LOAD DATA INFILE was launched under the assumption that the CSV file is latin1 -encoded. During this import, multibyte characters were interpreted as two single characters, and then encoded using utf-8 (again).

This double coding created anomalies like ÃƒÆ’Ã‚Âą instead of Ãą .

How to fix these lines?

+43
string mysql utf-8 character-encoding
Jul 11 2018-12-12T00:
source share
2 answers

The following MySQL function will return the correct utf8 string after double encoding:

 CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8) 

It can be used with the UPDATE to correct fields:

 UPDATE tablename SET field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8); 
+86
Jul 11 '12 at 15:56
source share

The above answer worked for some of my data, but after starting it got many NULL columns. I thought that if the conversion was not successful, it returns null. To avoid this, I added a little check.

 UPDATE tbl SET col = CASE WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) END 
+6
Aug 10 '16 at 15:12
source share



All Articles