What is the "short character name" in ISO / IE 10646?

Question

What is the "short character name" in ISO / IE 10646?

C ++ 11 2.3 / 2 says:

A character denoted by the universal symbol name \ UNNNNNNNN is a character whose short name in ISO / IEC 10646 is NNNNNNNN

So, I downloaded ISO / IEC 10646 , but I could not find the definition of "short character name". Can someone clarify what this should refer to?

My initial goal was to find out why 8 hexadecimal digits are required when specifying a code point with \ U, given that six digits are always sufficient. Therefore, I would be interested to know why C ++ 11 indicates that we use \ UNNNNNNNN instead of \ UNNNNNN.

+4

c ++ c ++ 11 unicode

KnowItAllWannabe 24 sept '12 at 22:26

source share

2 answers

ildjarn · Answer 1 · 2012-09-24T22:45:01+0000

In Unicode character codes, each character has a full name and a short name. For example, the character / has the full name SOLIDUS and the short name 002F . It is no coincidence that all short symbol names are represented in hexadecimal.

Due to the fact that you need to specify 8 digits, I suspect that for direct compatibility with future versions of the Unicode standard, which can use larger / larger character blocks.

Dietmar Kühl · Answer 2 · 2012-09-24T22:42:45+0000

I would venture to suggest that the last time we bit Unicode guys: C ++ originally made wchar_t so that it could contain every Unicode character. This required holding at least 16 bits, because Unicode had to use no more than 16 bits. Shortly after the popular implementation decided to use the 16-bit type wchar_t , it was discovered that 16 bits were actually not enough. The last time I watched, Unicode used 20 bits, but why play too short again? It is unlikely that the use of 24-bit types is widespread, and if you need to use a specific code point, this is most similar to using only 16 bits, i.e. You can use \uNNNN .

The description in paragraph 2 [lex.charset] in 2.3 means that universal character names refer to code points. At the same time, the name of the universal symbol is used to indicate the short name of the symbol. I am not an expert for Unicode, but I think it means that code points are needed for this.

What is the "short character name" in ISO / IE 10646?

More articles: