When computers were first created, they mainly worked only with characters found in English, which led to the 7-bit US-ASCII standard.
However, there are many different written languages โโin the world, and ways must be found to use them in computers.
The first method works fine if you restrict yourself to a particular language, it uses a culture-specific encoding such as ISO-8859-1, which can represent Latin-European characters in 8 bits or GB2312 for Chinese characters.
The second method is a bit more complicated, but theoretically allows you to represent each character in the world, this is the Unicode standard, in which each character from each language has a specific code. However, given the large number of existing characters (109,000 in Unicode 5), Unicode characters are typically represented using a three-byte representation (one byte for the Unicode plane and two bytes for the character code.
To maximize compatibility with existing code (some still use ASCII text), the standard UTF-8 encoding was designed as a way to store Unicode characters using only minimal space, as described in Joachim Sauer answer.
Thus, to view files encoded using certain encodings, such as ISO-8859-1, a file is usually used to be edited or read only by software (and people) who understand only these languages, and UTF-8, when available should be very interoperable and culturally independent. The current trend is that UTF-8 is replacing other encodings, even though they need work from software developers, since UTF-8 strings are more difficult to process than fixed-width encoding strings.
source share