Why is everyone using latin1?

Question

Why is everyone using latin1?

Someone just said that utf8 has an encoding with a variable length of 1 to 3 bytes.

So why are you still using latin1? If the same is stored in utf8, it is also 1 byte, but utf8 has the advantage that it can adapt to a larger character set.

Is this the hidden reason everyone uses latin1?
What are the disadvantages of using utf8 vs. latin1?

+4

php mysql internationalization phpmyadmin

David19801 Jan 25 '11 at 11:09

source share

3 answers

1) Reasons for effectiveness. With a constant length, switching to the nth character of a string is easy. With a variable length, you need to go through all the characters from the beginning of the line to find out their length. The only way to achieve this performance in unicode is through utf-32 (all characters are 4 bytes). But it takes up more memory.

2) All characters with diacritics (accents) in Latin-1 are in the Latin language range 128-255 and therefore are encoded with more than one character in utf-8.

3) Many programmers do not know how to use unicode

+3

Scharron Jan 25 '11 at 11:18

source share

It could be a "reason"

Everyone uses latin1 because everyone else is too..

Its really annoying mix is different from them, so you go with what the rest comes with

(I'm not saying that this is a good reason, but I think some people use it)

0

Nanne Jan 25 '11 at 11:11

source share

Gumbo · Accepted Answer · 2011-01-25T11:35:30+0000

ISO 8859-1 is (at least de facto) the default character encoding for several standards, such as HTTP (at least for textual content):

If the sender does not specify an explicit charset parameter, media subtypes of type "text" are defined as the default value for the encoding "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be marked with the corresponding encoding value.

The reason ISO 8859-1 was chosen is probably its superset of US-ASCII, which is the main character set for Internet technologies. And since the World Wide Web was invented and developed at CERN in Geneva, Switzerland, this could be the reason for choosing Western European characters for the 128 remaining characters.

When the Unicode standard was developed, the ISO 8859-1 character set was used to base the Unicode character set (universal character set), so the first 256 characters were identical to the ISO 8859-1 character. This was probably done because of the importance of the ISO 8859-1 standard for the Internet, as it was already the standard character encoding for many technologies.

Now, to discuss the benefits of ISO 8859-1 as opposed to UTF-8, we need to look at the basic character sets and coding schemes that are used to encode these characters:

ISO 8859-1 contains 256 characters, where the character point of each character is directly mapped to its binary representation. Thus, 123 _{10 is} encoded using 01111011 ₂ .
UTF-8 uses a variable length prefix encoding scheme, where the prefix indicates the word length. UTF-8 is used to encode universal character set characters, and its coding scheme can encode 1,048,576 characters. The first 128 characters require 1 byte, characters in 0x80-0x7FF require 2 bytes, characters in 0x800-0xFFFF require 3 bytes, and characters in 0x10000-0x1FFFFF require 4 bytes.

So the difference is if the range of interchangeable characters, on the one hand, and the length of the encoded word on the other hand.

Thus, the choice of the “correct” character encoding depends on the needs: if you only need ISO 8859-1 characters (or US-ASCII as a subset), use ISO 8859-1, since it requires only one byte for each character as opposed UTF-8, where characters 128-255 require two bytes. And if you need characters other than ISO 8859-1 characters, use UTF-8.

Why is everyone using latin1?

More articles: