Unicode has over 100,000 characters. Most C. implementations have 256 possible char values.
Therefore, UTF-8 uses more than one char to encode each character, and the decoder needs a return type that is greater than char .
wchar_t is a larger type than char (well, it shouldn't be bigger, but usually it). It represents the characters of a character set defined for implementation. In some implementations (most importantly, Windows, which uses surrogate pairs for characters outside the "base multilingual plane"), it is still not large enough to represent any Unicode character, which seems to be the reason that the decoder you are using uses int .
You cannot print wide characters with printf because it deals with char . wprintf deals with wchar_t , so if the wide character set is unicode, and if wchar_t is int on your system (like Linux), then wprintf and friends will print the decoder output without further processing. Otherwise it will not be.
In any case, you cannot transfer arbitrary characters to Unicode, since there is no guarantee that the terminal can display them, or even that a wide range of characters is in any way associated with Unicode.
SQLite probably used an unsigned char to:
- they know the signature - this is an implementation that determines whether the
char signed or not. - they can do right shifts and assign values ββout of range, and also get consistent and definite results in all C implementations. Executions have more freedom than a
signed char behaves than an unsigned char .
source share