2-byte (UCS-2) wide strings under GCC

when porting my Visual C ++ project to GCC, I found that the default wchar_t data type is 4 bytes UTF-32. I could override this with the compiler option, but then the whole part of the wcs * (wcslen, wcscmp, etc.) RTL turns out to be unusable, since it assumes 4-byte strings.

Currently, I have reworked 5-6 of these functions from scratch and #defined my implementations. But is there a more elegant option - say, a GCC RTL assembly with a 2-byte wchar-t sitting quietly somewhere waiting for a connection?

The specific flavors of GCC I came to are Xcode on Mac OS X, Cygwin, and the one that comes with Debian Linux Etch.

+3
source share
4 answers

Re-implemented 5-6 of the more general wcs * functions, #defined my implementations in.

0
source

But is there a more elegant option - say, a GCC RTL build with 2-byte wchar-t sitting quietly somewhere awaiting communication?

No. This is a platform issue, not a GCC problem.

That is, the ABI Linux platform indicates that wchar_tit is 32 bits wide, so either you need to use the whole new library (for which ICU is a popular choice), or your code port to handle 4-byte wchar_ts. All libraries that you could link to will also have 4 bytes wchar_tand will be broken if you use GCC -fshort-wchar.

But on Linux specifically, almost everyone standardized UTF-8 for all multibyte encodings.

+2

ICU. API UTF-16.

+1

, wchar_t - . .

Linux- Unicode , , UCS-2 UTF-8 . API char * Unicode .

- , : Qt, ICU ..

, cygwin 2 wchar_t, Windows.

+1

Source: https://habr.com/ru/post/1744537/


All Articles