How to find out what the current C ++ encoding is?

How to find out what the current C ++ encoding is?

In the console application (WinXP) I get negative values โ€‹โ€‹for some characters (e.g. รครถรผรฉ) with

(int)mystring[a] 

and it surprises me. I expected the values โ€‹โ€‹to be between 127 and 256.

So something like GetCharset () or SetCharset () in C ++?

+4
source share
4 answers

It depends on how you look at the value you have. char can be signed (for example, on Windows) or unsigned, as on some other systems. So what you need to do is print the value as unsigned to get what you are asking for.

C ++ is still char -set agnostic. For the Windows console, you can use: GetConsoleOutputCP .

+5
source

Take a look at std::numeric_limits<char>::min() and max() . Or CHAR_MIN and CHAR_MAX if you don't like typing, or if you need an integer constant expression.

If CHAR_MAX == UCHAR_MAX and CHAR_MIN == 0 , then unsigned characters (as expected). If CHAR_MAX != UCHAR_MAX and CHAR_MIN < 0 , they are signed (as you can see).

The standard 3.9.1 / 1 provides the absence of other features: "... a simple char can take the same values โ€‹โ€‹as a signed char or unsigned char; one of them is determined by the implementation."

This tells you whether the char signed or unsigned, and what confuses you. Of course, you cannot call anything to change it: from the POV of the program that it baked to the compiler, even if the compiler has ways to change it (GCC certainly does: -fsigned-char and -funsigned-char ).

The usual way to handle this is that you are going to pass char to int , first pass it through unsigned char . So in your example (int)(unsigned char)mystring[a] . This ensures that you get a non-negative value.

It doesn't really tell you what encoding your implementation uses for char , but I don't think you need to know that. In Microsoft compilers, the answer essentially is that the character encoding "ISO-8859-mutter-mutter" is usually used. This means that characters with 7-bit ASCII values โ€‹โ€‹are represented by this value, and values โ€‹โ€‹outside this range are ambiguous and will be interpreted by the console or another receiver in accordance with the settings of this receiver. ISO Latin 1 unless otherwise indicated.

As a matter of fact, the interpretation of characters depends on the locale, and the locale can be changed and tested using a whole set of materials at the end of the C ++ standard, which I personally have never passed and can not advise on; -)

Please note that if there is a discrepancy between the encoding used and the encoding used by your console, you may encounter problems. But I think that itโ€™s separate from your problem: whether characters can be negative or not, has nothing to do with encodings, is it just that char is signed.

+1
source

The only guarantor provided by the standard for members of the basic character set:

2.2 Character Sets

3 The basic set of execution characters and the basic nature of the execution each must contain all members of the main set of source characters, plus control characters representing warnings, carriage return and return, plus a null character (respectively, a wide null character), whose representation has all zero bits. For each character set of the main execution, the values โ€‹โ€‹of the members must be non-negative and different from one another. Both in the source and in the execution of basic character sets, the value of each character after 0 in one above the list of decimal digits is one greater than the value of the previous one. The execution character set and the wide character set are supersets of the basic character set and the basic character wide angle set, respectively. the values โ€‹โ€‹of the execution members character sets are a specific implementation, and any additional members are locale-specific

In addition, it is assumed that the type char should contain:

3.9.1 Basic types

1 Objects declared as characters (char) must be large enough to hold any member of the main implementation of the character set.

This way, no supporters will get the correct meaning for the characters you mentioned. However, try using unsigned int to store this value (for all practical purposes, it never makes sense to use a signed type to store char values โ€‹โ€‹ever if you are going to print / pass them).

0
source
Usually characters

usually signed by default. Try it.

 cout << (unsigned char) mystring[a] << endl; 
0
source

Source: https://habr.com/ru/post/1304363/


All Articles