2-byte (UCS-2) wide strings under GCC

Question

2-byte (UCS-2) wide strings under GCC

when porting my Visual C ++ project to GCC, I found that the default wchar_t data type is 4 bytes UTF-32. I could override this with the compiler option, but then the whole part of the wcs * (wcslen, wcscmp, etc.) RTL turns out to be unusable, since it assumes 4-byte strings.

Currently, I have reworked 5-6 of these functions from scratch and #defined my implementations. But is there a more elegant option - say, a GCC RTL assembly with a 2-byte wchar-t sitting quietly somewhere waiting for a connection?

The specific flavors of GCC I came to are Xcode on Mac OS X, Cygwin, and the one that comes with Debian Linux Etch.

+3

c ++ gcc right-to-left ucs2

Seva Alekseyev May 07, '10 at 17:28

source share

4 answers

But is there a more elegant option - say, a GCC RTL build with 2-byte wchar-t sitting quietly somewhere awaiting communication?

No. This is a platform issue, not a GCC problem.

That is, the ABI Linux platform indicates that wchar_tit is 32 bits wide, so either you need to use the whole new library (for which ICU is a popular choice), or your code port to handle 4-byte wchar_ts. All libraries that you could link to will also have 4 bytes wchar_tand will be broken if you use GCC -fshort-wchar.

But on Linux specifically, almost everyone standardized UTF-8 for all multibyte encodings.

+2

greyfade 07 '10 17:59

ICU. API UTF-16.

+1

bmargulies 07 '10 17:31

, wchar_t - . .

Linux- Unicode , , UCS-2 UTF-8 . API char * Unicode .

- , : Qt, ICU ..

, cygwin 2 wchar_t, Windows.

+1

Yann Ramin 07 '10 17:58

Seva Alekseyev · Accepted Answer · 2010-10-12T03:41:55+0000

Re-implemented 5-6 of the more general wcs * functions, #defined my implementations in.

2-byte (UCS-2) wide strings under GCC

More articles: