How much text does UTF-8 fit in the MySQL text field?

According to MySQL, the text column contains 65,535 bytes.

So, if this is a legal border, then in fact it will only correspond to approximately 32 thousand characters of UTF-8, right? Or is it one of those “fuzzy” boundaries where the guys who wrote the documents cannot distinguish characters from bytes, and actually allow ~ 64k UTF-8 characters if they are set to something like utf8_general_ci ?

+44
mysql utf-8
Dec 12 '10 at 2:40
source share
3 answers

A text column can contain up to 65,535 bytes.

The utf-8 character can contain up to 3 bytes.

So ... your actual limit may be 21,844 .

See the manual for more information: http://dev.mysql.com/doc/refman/5.1/en/string-type-overview.html

String of variable length. M represents the maximum length of a column in characters. The range of values ​​of M is 0 65535. The effective maximum length of VARCHAR is subject to the maximum row size (65 535 bytes, which are divided between all columns) and the used character set. For example, utf8 characters may require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared as a maximum of 21,844 characters.

+70
Dec 12 '10 at 2:51
source share

UTF-8 characters can accept up to 4 bytes each, and not 2, as you expect. UTF-8 is variable-width encoding , depending on the number of significant bits in a Unicode code point:

  • 7 bits and below at Unicode code point: 1 byte in UTF-8
  • 8 to 11 bits: 2 bytes in UTF-8
  • 12 to 16 bits: 3 bytes
  • 17 to 21 bits: 4 bytes

The original UTF-8 specification allows encoding up to 31-bit Unicode values, taking up to 6 bytes for encoding in the UTF-8 form. After UTF-8 became popular, the Unicode Consortium announced that it would never use code points beyond 2 21 & thinsp; - & thinsp; 1. Now it is standardized as RFC 3629 .

MySQL currently (i.e. version 5.6) only supports Unicode Basic Multilingual Plane , for which UTF-8 requires up to 3 bytes per character. This means that the current answer to your question is that the TEXT field can contain at least 21,844 characters.

Depending on how you look at it, the actual limits are higher or lower:

  • If you also think that the BMP restriction will eventually be lifted in MySQL or one of them, it will, you should not expect to be able to store more than 16383 characters in this field if your MySQL client allows arbitrary input of text in Unicode format.

  • On the other hand, you can use the fact that UTF-8 is a variable-width encoding. If you know that your text is basically plain English with just a random character other than ASCII, your effective limit in practice may come close to a maximum of 64 & thinsp; KB & thinsp; - & thinsp; 1 character.

+11
Dec 12 2018-12-12T00:
source share

However, when used as a primary key , MySQL assumes that each column size limit adds a 3 byte key.

 mysql> alter table test2 modify code varchar(333) character set utf8; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table test2 modify code varchar(334) character set utf8; ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes 

Well, using long string columns as your primary key is usually a bed practice, however I ran into this problem when working with a database of one commercial (!) Product.

+1
Dec 15 '10 at 9:24
source share



All Articles