Confusion of character encoding!

Having some problems related to the differences between UTF-8, UTF-16, ASCII and ANSI. After some research, I have an idea, but it would be very helpful if someone could accurately explain the difference between them (including representing the bytes of a typical character from each).

I want my question to come down to

1) How do each of the above store characters as bytes
2) What are the differences between the above standards
3) What is a code page
4) Method of converting characters between the various types.

Many thanks:)

+3
source share
6 answers

I found Joel's article in Unicode to explain this very well. In particular, it covers history (essential for this subject), encodings (UTF-8/16, etc.) and code pages.

+12
source

.

1: . .

2: .

ASCII
128 .

ANSI
, ASCII, . .

UTF-8
. Unicode, ASCII. - .

UTF-16
UTF-8, - 16 . , 8 .

3: - , , ( ) . Unicode , . ANSI , 256 . , , , , .

4: , , ( ). . UTF-8 ASCII, 128 , , ASCII.

ad-hoc, , .

+4

, Unicode ( !) ( Joel Software)

+2

O'Reilly CJKV : CJKV, . " ! *! ** # @Euro, ?".

0

Unix recode iconv iconv (man 3 iconv) C ++.

Perl, Encode (, use Encode; print encode("utf-8", "\xabfoo")). Python, unicode.encode / str.decode (, print u'\xabfoo'.encode('utf-8')).

0

, :

  • UTF-8 ASCII , 127 ASCII UTF-8 ( UTF, ). , ASCII , ASCII UTF-8 .

    , UTF-8 - ; " 127 . , . , ASCII ? : a , n 1 , n .

  • , , , . , , , , . !

  • UTF-8 is also the standard for XML.

0
source

Source: https://habr.com/ru/post/1708808/


All Articles