Should I use wchar_t when using UTF-8?

UTF-8 can encode in 1, 2, and up to 4 bytes. The only char on my system is 1 byte. Should I use wchar_t as a precaution so that I can match any arbitrary UTF-8 encoded character?

+4
source share
2 answers

No, not worth it! The Unicode 4.0 standard (ISO 10646: 2003) states that:

The width of wchar_t depends on the compiler and can be as much as 8 bits. Therefore, programs that must be portable across any C or C ++ compiler should not use wchar_t to store text in Unicode.

In most cases, the "character" of UTF-8 text will not be relevant to your program, so treating it as an array of char elements, like any other line, will suffice. However, if you need to extract individual characters, these characters must be stored in a type with a width of at least 24 bits (e.g. uint32_t ) to accommodate all Unicode code points.

+8
source

wchar_t not very useful if you want to make your portable code.

On wikipedia

The width of wchar_t is compiler specific and can be as much as 8 bits. Therefore, programs that must be portable across any C or C ++ Compiler should not use wchar_t to store text in Unicode. The wchar_t type is intended for storing wide characters defined by the compiler, which may be Unicode characters in some compilers

Further,

Both C and C ++ introduced fixed-size char16_t and char32_t character types in 2011 in accordance with their standards to provide a unique representation of 16-bit and 32-bit Unicode conversion formats, leaving wchar_t defined.

+2
source

Source: https://habr.com/ru/post/1493558/


All Articles