Convert string from UTF-8 to ISO-8859-1

I am trying to convert a UTF-8 string to ISO-8859-1 char* for use in legacy code. The only way to do this is iconv .

I would definitely prefer a fully C ++-based string solution, and then just call .c_str() on the resulting string.

How can I do it? Sample code, if possible, please. I use iconv if this is the only solution you know.

+6
source share
3 answers

I am going to change my code from another answer to implement an offer from Alf.

 std::string UTF8toISO8859_1(const char * in) { std::string out; if (in == NULL) return out; unsigned int codepoint; while (*in != 0) { unsigned char ch = static_cast<unsigned char>(*in); if (ch <= 0x7f) codepoint = ch; else if (ch <= 0xbf) codepoint = (codepoint << 6) | (ch & 0x3f); else if (ch <= 0xdf) codepoint = ch & 0x1f; else if (ch <= 0xef) codepoint = ch & 0x0f; else codepoint = ch & 0x07; ++in; if (((*in & 0xc0) != 0x80) && (codepoint <= 0x10ffff)) { if (codepoint <= 255) { out.append(1, static_cast<char>(codepoint)); } else { // do whatever you want for out-of-bounds characters } } } return out; } 

Invalid UTF-8 input when dropping characters.

+7
source

Convert UTF-8 to 32-bit Unicode first.

Then save the values ​​between 0 and 255.

These are Latin-1 code points, and for other values, decide whether you want to treat this as an error or perhaps replace it with code 127 (my fav, ASCII "del") or a question mark or something else.


The C ++ Standard Library defines the specialization std::codecvt that can be used,

 template<> codecvt<char32_t, char, mbstate_t> 

C ++ 11 §22.4.1.4 / 3 : "the specialization codecvt <char32_t, char, mbstate_t> converts between UTF-32 and UTF-8 encoding schemes"

+6
source

Alfs proposal implemented in C ++ 11

 #include <string> #include <codecvt> #include <algorithm> #include <iterator> auto i = u8"H€llo Wørld"; std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8; auto wide = utf8.from_bytes(i); std::string out; out.reserve(wide.length()); std::transform(wide.cbegin(), wide.cend(), std::back_inserter(out), [](const wchar_t c) { return (c <= 255) ? c : '?'; }); // out now contains "H?llo W\xf8rld" 
+1
source

Source: https://habr.com/ru/post/969363/


All Articles