First published in What Every Developer Should Know About Character Encoding .
If you are writing code that relates to a text file, you will probably need this.
Let's start with two key elements.
1.Unicode does not solve this problem for us (yet).
2. . , "" .
- - . 127 ( ). , A-Z - , .. - . -, HTML XML , 127, - .
. , 2 , . , 8 256 . , ( ). , , 127 , . / , , ..
, 256 , 128 - 255 , DBCS ( ). ( ) 256 . 128 * 256 . , . , DBCS.
. , .. . . - , XML , , , .
. , , , HTML XML. HTML XML . , , UTF-8, , . , , , - .
1 - . . . , 1 - 127.
UTF-8, . UTF-8 . 127 , HTML XML . -, , , , , .
UTF-8 DBCS . 128 . 128 , , . , . , sersies . , . 6 . MBCS ( ), . , , , .
- HTML XML , , . , , , Γ . , - . , , . , - .
2 - HTML XML , . , .
, , , , ? / , , , . Java,.NET .. . () , . , - , #, Java .. " " . , ? , . , 0 127 - .
- . , , .
3 - . HTML XML, , . , , .
4 - . XML , UTF-8. XML, , . ( .)
, , . ? - unicode. , , Java .NET. unicode. unicode . char - 16 , . , , .
5 - ( , ) - unicode . ++ ( - ). , , , .
, . -, , . -, . , , , .