Migrating MySQL UTF / Unicode Tips

Does anyone have any hints or points to look out for when trying to migrate MySQL tables from standard case-insenstive swedish or ascii-charsets to utf-8? Some of the projects I participate in are striving for better internationalization, and the database will be a significant part of this change.

Before we move on to changing the database, we are going to transform each site to use the UTF-8 character encoding (from least critical to most) to ensure that all input / output data uses the same character set.

Thanks for any help

+4
source share
5 answers

Some tips:

  • The CHAR and VARCHAR columns will use up to 3 times more disk space. (You probably won't be able to increase the disk space for Swedish words.)
  • Use SET NAMES utf8 before reading or writing to the database. If you do not, you will receive partially distorted characters.
+2
source

Beware of index length restrictions. If the table is structured, say:

a varchar (255) b varchar (255) key ('a', 'b')

You will pass a limit of 1000 bytes per key length. 255 + 255 is fine, but 255 * 3 + 255 * 3 will not work.

+1
source

The CHAR and VARCHAR columns will use 3 times more disk space.

Only if they are filled with Latin-1 ordinals> 128. Otherwise, the increased utilization of the UTF-8 space is minimal.

0
source

Comparison is not always favorable. You will get umlats mapped to non-dumb versions, which is not always correct. You can go w / utf8_bin, but then everything is also case sensitive.

0
source

Source: https://habr.com/ru/post/1276572/


All Articles