UTF-8 characters can accept up to 4 bytes each, and not 2, as you expect. UTF-8 is variable-width encoding , depending on the number of significant bits in a Unicode code point:
- 7 bits and below at Unicode code point: 1 byte in UTF-8
- 8 to 11 bits: 2 bytes in UTF-8
- 12 to 16 bits: 3 bytes
- 17 to 21 bits: 4 bytes
The original UTF-8 specification allows encoding up to 31-bit Unicode values, taking up to 6 bytes for encoding in the UTF-8 form. After UTF-8 became popular, the Unicode Consortium announced that it would never use code points beyond 2 21 & thinsp; - & thinsp; 1. Now it is standardized as RFC 3629 .
MySQL currently (i.e. version 5.6) only supports Unicode Basic Multilingual Plane , for which UTF-8 requires up to 3 bytes per character. This means that the current answer to your question is that the TEXT field can contain at least 21,844 characters.
Depending on how you look at it, the actual limits are higher or lower:
If you also think that the BMP restriction will eventually be lifted in MySQL or one of them, it will, you should not expect to be able to store more than 16383 characters in this field if your MySQL client allows arbitrary input of text in Unicode format.
On the other hand, you can use the fact that UTF-8 is a variable-width encoding. If you know that your text is basically plain English with just a random character other than ASCII, your effective limit in practice may come close to a maximum of 64 & thinsp; KB & thinsp; - & thinsp; 1 character.
Warren Young Dec 12 2018-12-12T00: 00Z
source share