Yes, this means that symbol
may contain part of a surrogate pair on Windows. On *nixes
wchar_t
is 32 bits long and will contain the entire Unicode character set. Note that a Unicode code point is not a character, as some characters can be encoded with more than one Unicode code point, so it makes no sense to count characters at all. In particular, this means that it makes no sense to use anything other than UTF-8 encoded narrow-gauge nodes somewhere outside of Unicode libraries, even on Windows.
Read this old thread for more details.
source share