Why are character arrays accept non-ASCII characters in C ++?

So, I want to be able to use Chinese characters in my C ++ program, and I need to use some type to store such characters outside of the ASCII range.

However, I tried to run the following code and it worked.

#include <iostream> int main() { char snet[4]; snet[0] = 'δ½ '; snet[1] = '爱'; snet[2] = 'ζˆ‘'; std::cout << snet << std::endl; int conv = static_cast<int>(snet[0]); std::cout << conv << std::endl; // -96 } 

This doesn't make sense since with sizeof(char) in C ++, since the g ++ compiler evaluates to 1, but Chinese characters cannot be expressed in a single byte.

Why are Chinese characters allowed here to be char type?

What type should be used to place Chinese characters or non-ASCII characters in C ++?

+5
source share
1 answer

When compiling code using the -Wall flag, you will see warnings like:

warning: overflow in implicit constant conversion [-Woverflow] snet [2] = 'ζˆ‘';

warning: multi-character symbolic constant [-Wmultichar] snet 1 = '爱';

Visual C ++ in debug mode gives the following warning:

c: \ users \ you \ temp.cpp (9): warning C4566: a symbol denoted by the universal symbol name '\ u4F60' cannot be displayed on the current code page (1252)

What happens under the curtains is that your two byte Chinese characters are implicitly converted to char. This overflow is overflowing and therefore you see a negative value or something strange when you print it to the console.

Why are Chinese characters allowed here to be char type?

You can, but should not, just as you can define char c = 1000000;

What type should be used to place Chinese characters or non-ASCII characters in C ++?

If you want to preserve Chinese characters and you can use C ++ 11, go for UTF-8 encoding with std :: string ( live example ).

 std::string msg = u8"δ½ ηˆ±ζˆ‘"; 
+4
source

Source: https://habr.com/ru/post/1274654/


All Articles