What is the difference between Unicode codes and Unicode scanning?

I see that these two are used (apparently) interchangeably in many cases - are they the same or different? It also depends on whether the language speaks of UTF-8 (like Rust) or UTF-16 (like Java / Haskell). Is the definition of the code point / scalar somehow dependent on the encoding scheme?

+4
source share
1 answer

First, consider the definitions of D9, D10, and D10a, Section 3.4, Symbols and Encoding :

D9 Unicode Code Space : Range of integers from 0 to 10FFFF sixteen.

D10 Code Point : Any value in the Unicode code code.

β€’ .

...

D10a : < > : , , , , , , .

[ ]

, . , " ".

D76, 3.9, Unicode:

D76 Unicode: , .

β€’ Unicode 0 D7FF 16 E000 16 10FFFF 16, .

3.8 D76. , : . UTF-16, . ( 1114112 , 2 16= 65536 .) UTF-8 ; ( 1-4 ), .

: , . " ", . UTF-16 , . UTF-8 .

Unicode glossary. , Unicode.

+4

Source: https://habr.com/ru/post/1692775/


All Articles