C ++ towupper () does not convert specific characters

I am using Borland C ++ Builder 2009, and my application has been translated into several languages, including Polish.

For a small piece of functionality, I use towuppper () for the header line to emphasize it when the user first ignores it.

The source line is loaded from the language dll into the wstring utf16 object, and I convert like this:

int length = mystring.length() ; for (int x = 0 ; x < length ; x++) { mystring[x] = towupper(mystring[x]); } 

All this works well, except for Polish, where the following sentence: "Rozumiem ryzykowność wykonania tej operacji" turns into "ROZUMIEM RYZYKOWNO ść WYKONANIA TEJ OPERACJI" instead of "ROZUMIEM RYZYKOWNO ŚĆ WYKONANIA TEJERAC

(note that the last two characters of the word "ryzykowność" are not converted).

This is not the case as if there were no Unicode variants for this symbol. The Unicode 346 character does the trick. http://www.fileformat.info/info/unicode/char/015a/index.htm

Is this an obsolete library in an obsolete compiler installation, or am I missing something else?

+6
source share
2 answers

towupper implementations towupper not required by the C ++ standard to perform conversions in Unicode code. Even if wide strings are Unicode strings. Even in cases where one lower case of a code point is translated into one upper case.

In addition, towupper cannot correctly convert Unicode code, even if the implementation supports it. Case conversion can actually change the number of code points in a Unicode character sequence. And towupper can't do it.

You cannot rely on the standard C ++ library to solve such Unicode issues. You will need to go to a special Unicode library such as ICU.

+11
source

On Windows, this will work: EDIT Just realized that you are using Borland, not Msvc.

  #include <cctype> #include <clocale> int main(int argc, char** argv) { setlocale(LC_ALL, "polish"); wchar_t c[2] = { L'ś', L'ć'}; wchar_t c1 = _towupper_l(c[0], _get_current_locale()); wchar_t c2 = _towupper_l(c[1], _get_current_locale()); return 0: } 

First you need to set the language for polishing using setlocale . And then use _ towupper_l . Here's a link that tells you which lines referencing a particular language can be used with setlocale .

EDIT: Please note that if I print the results:

 _wprintf_l(L" c1 = %c, c2 = %c\n", _get_current_locale(), c1, c2); 

The output will be:

 c1 = S, c2 = C 

But if I look at the values ​​of C1 and C2 in my debugger, I can see the correct results with accents. My console just won’t print such characters.

+2
source

Source: https://habr.com/ru/post/1013891/


All Articles