How does gcc define a wide character set when calling `mbtowc ()`?

Question

How does gcc define a wide character set when calling `mbtowc ()`?

According to the gcc manual, the -fwide-exec-charset option defines a wide range of characters for wide string and character constants at compile time.

But what is the wide character set when converting a multibyte character to a wide character, calling mbtowc() at runtime? The POSIX standard says that the character set of multibyte characters is determined by the LC_CTYPE category of the current locale, but says nothing about the wide character set. I don’t have the C standard now, so I don’t know what this says in the C standard.

Does the gcc -fwide-exec-charset parameter -fwide-exec-charset wide character set used by mbtowc() , like during compilation?

+4

c character-encoding multibyte widechar

spockwang Mar 15 '13 at 6:27

source share

1 answer

user4815162342 · Accepted Answer · 2013-03-15T07:18:49+0000

Short answer: the character set used for wide strings is determined by the wchar_t characteristics known at compile time. Since mbtowc is a library function, this happens when creating libc.

mbtowc reads one character from a string encoded in external encoding and writes it to the wchar_t value, which can represent any character. Similarly, mbstowcs converts an external C encoded string into a simple wchar_t array. From a system point of view, it makes no sense to indicate the "encoding" of the resulting wide character / string, because changing its output encoding in any way violates the use of the resulting wide string as a wchar_t array.

You can describe mbstowcs as creating fixed-width Unicode encodings such as UCS-2 or UCS-4 (more precisely, UTF-16 or UTF-32) if the wide characters correspond to ISO 10646 code points and are wchar_t wide. You can also describe it as little-endian or big-endian depending on your finitude representation of the wchar_t processor. But these are platform properties that you cannot change at run time more than you can change endianness, or ASCII for EBCDIC.

-fwide-exec-charset serves to explicitly indicate to the compiler the encoding corresponding to the internal representation of array-of- wchar_t . This is useful when it is different from the view that the compiler usually generates (because you cross with the compiler or because the compiler was not configured correctly). That's why the manual says that "you will have problems with encodings that don't exactly match in wchar_t ."

How does gcc define a wide character set when calling `mbtowc ()`?

More articles: