UTF-8 vs ASCII text

Why does sql database use UTF-8 encoding? Do they both use 8-bit to store a character?

+4
source share
3 answers

UTF-8 is used to support a wide range of characters. In UTF-8, up to 4 bytes can be used to represent a single character.

Joel wrote an article on this subject that you may want to refer to

Absolute Minimum Every software developer should absolutely, positively need to know about Unicode and character sets (no excuses!)

+8
source

For "normal" characters, only 8 bits are used. For characters that are not suitable for 8 bits, more bits can be used. This makes UTF-8 a variable-length encoding.

Wikipedia has a good article on UTF-8.

ASCII defines only 128 characters. So just 7 bits. But usually it is stored with 8 bits / character. RS232 (old serial communication) can be used with 7-bit bytes.

+1
source

ASCII can only represent a limited number of characters at a time. It is not very useful to represent any language that is not based on the Latin character set. However, UTF-8, which is the coding standard for UCS-4 (Unicode), can represent almost any language. He does this by combining several bytes together to represent one character (or a glyph to be more correct).

0
source

Source: https://habr.com/ru/post/1308783/


All Articles