By default, std::codecvt<char, char, mbstate_t> does not suit you: this means that it does not convert at all. You will need imbue() a std::locale with the UTF-8 code conversion phase. However, char cannot really represent Unicode values. You will need a larger type, although the values you are looking for do indeed fit in char in Unicode, but not in any encoding that accepts all values.
The C ++ 2011 standard defines the UTF-8 conversion facet std::codecvt_utf<...> . However, it is not specialized for the internal char type, but only for wchar_t , uint16_t and uint32_t . Using clang along with libC ++, I could do the following to do the right thing:
#include <fstream> #include <locale> #include <codecvt> int main() { std::wofstream out("utf8.txt"); std::locale utf8(std::locale(), new std::codecvt_utf8<wchar_t>()); out.imbue(utf8); out << L"\xd6\xf6\xfc\n"; out << L"Ööü\n"; }
Note that this code uses wchar_t , not char . It might seem reasonable to use char16_t or char32_t because they are designed to encode UCS2 and UCS4 respectively (if I understand the standard correctly), but the stream type is not defined for them. Setting thread types for a new character type is a pain.
source share