Per MSDN :
"For the Microsoft C / C ++ compiler, the source and executive character sets are ASCII."
C ++ 03
2.1 Translation Phases
".. Any character of the source file is not in the main character set of the source (2.2) is replaced with the name of the universal character, which means that the character. ( The implementation can use any internal encoding , as long as the actual extended character found in the source file, and that the same extended character expressed in the source file as the name of the generic character (that is, using the notation \ uXXXX) is treated the same way). "
2.13.2 Character Literals
"The universal symbol-name is translated into the encoding, into the executable character set of the specified symbol. If there is such encoding, the universal symbol-name is translated into the implementation-defined.
To check which set of execution characters MSVC ++ uses, I wrote the following code:
wchar_t *str = L"中"; unsigned char *p = reinterpret_cast<unsigned char*>(str); for (int i = 0; i < sizeof(L"中"); ++i) { printf ("%x ", *(p + i)); }
The result shows that 2d 4e 0 0 , and 0x4e2d is the UTF-16 encoding of this Chinese character. Therefore, I conclude: UTF-16 is used as the MSVC execution character (My version: 2012 4.5.50709)
After that I tried to print this symbol on the Windows console. Since the default locale used by the console is "C" I set the locale code to code page 936, which represents simplified Chinese characters.
// use the execution environment locale setting, which is 936 wchar_t *str = L"中"; char* locale = setlocale(LC_ALL, ""); wprintf (L"%ls\n", str);
What outputs:
中
I am wondering how to decode a character encoded in UTF-16 by a Windows console whose language (decoder) is set to non-UTF-16 (MS codepage 936)? How can this happen?