Should I use wchar_t when using UTF-8?

Question

Should I use wchar_t when using UTF-8?

UTF-8 can encode in 1, 2, and up to 4 bytes. The only char on my system is 1 byte. Should I use wchar_t as a precaution so that I can match any arbitrary UTF-8 encoded character?

+4

c ++ unicode utf-8

0x499602D2 Jul 26 '13 at 2:24

source share

2 answers

wchar_t not very useful if you want to make your portable code.

On wikipedia

The width of wchar_t is compiler specific and can be as much as 8 bits. Therefore, programs that must be portable across any C or C ++ Compiler should not use wchar_t to store text in Unicode. The wchar_t type is intended for storing wide characters defined by the compiler, which may be Unicode characters in some compilers

Further,

Both C and C ++ introduced fixed-size char16_t and char32_t character types in 2011 in accordance with their standards to provide a unique representation of 16-bit and 32-bit Unicode conversion formats, leaving wchar_t defined.

+2

0decimal0 Jul 26 '13 at 2:32

source share

duskwuff · Accepted Answer · 2013-07-26T02:33:01+0000

No, not worth it! The Unicode 4.0 standard (ISO 10646: 2003) states that:

The width of wchar_t depends on the compiler and can be as much as 8 bits. Therefore, programs that must be portable across any C or C ++ compiler should not use wchar_t to store text in Unicode.

In most cases, the "character" of UTF-8 text will not be relevant to your program, so treating it as an array of char elements, like any other line, will suffice. However, if you need to extract individual characters, these characters must be stored in a type with a width of at least 24 bits (e.g. uint32_t ) to accommodate all Unicode code points.

Should I use wchar_t when using UTF-8?

More articles: