Widescreen windows

Windows defines a 16-bit wchar_t character. However, the UTF-16 encoding used tells us that some characters can be encoded using 4 bytes (32 bits).

Does this mean that if I am developing a Windows , the following statement:

 wchar_t symbol = ... // Whatever 

can represent only part of the actual character?


And what happens if I do the same in *nix , where wchar_t is 32 bits long?

+4
source share
1 answer

Yes, this means that symbol may contain part of a surrogate pair on Windows. On *nixes wchar_t is 32 bits long and will contain the entire Unicode character set. Note that a Unicode code point is not a character, as some characters can be encoded with more than one Unicode code point, so it makes no sense to count characters at all. In particular, this means that it makes no sense to use anything other than UTF-8 encoded narrow-gauge nodes somewhere outside of Unicode libraries, even on Windows.

Read this old thread for more details.

+7
source

Source: https://habr.com/ru/post/1384576/


All Articles