Is there a Unicode encoding where each "character" is just one point in the code?

Trying to rephrase: can you match each character combination at one code point?

I am new to Unicode, but it seems to me that there is no encoding, normalization or representation, where one character will be one code point in each case in Unicode. It's right?

Is this also true for Basic Multilingual Plane?

+3
source share
3 answers

char == (..: char //what-have-you): UCS-4 4- . , , , - .

( : e + & sharp; = > & eacute;): , , . , ... , , .

+7

?

? "à̴̵̶̷̸̡̢̧̨̛̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̳̹̺̻̼͇͈͉͍͎̄̅̆̇̈̉̊̋̌̍̏̐̑̒̓̔̽̾̿̓̈͆͊͋͌̕̚͏̴̵̶̷̸̡̢̧̨̛͓͔͕͖͙͚̖̗̘̙̜̝̞̟̠̣̤̥̦̩̪̫̬̭̮̯̰̳̹̺̻̼͇͈͉͍͎͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͭͮͯ̄̅̆̇̈̉̊̋̌̍̏̐̑̒̓̔̽̾̿̓̈͆͊͋͌͘̕̚͜͟͢͝͞͠͏͓͔͕͖͙͚͐͑͒͗͛ͣͤͥͦͧͨͩͪͫͬͭͮͯ͘͜͟͢͝͞͠"? ( "a" , ?) .

Unicode "" , ççñü. C , .

+6

, , , ​​ . ?

"". Unicode ( 7 3 : " , , " ) ( 11: " ( ) " ). , , , , " ". ( 11, 4): " "

Basic Multilingual Plane?

There is no conceptual difference associated with abstract or encoded characters between BMP and other planes. The above statement is true for all subsets of the code space.

Depending on your application, you must distinguish between the terms glyph, grapheme cluster, grapheme, abstract character, encoded character, code point, scalar value, code unit and byte. All these concepts are different, and there is no simple comparison between them. In particular, there is almost no one-to-one comparison between these objects.

+1
source

Source: https://habr.com/ru/post/1783710/


All Articles