How to convert from UTF-8 to ANSI using standard C ++

I have some lines read from a database, stored in char * and in UTF-8 format (you know, "á" is encoded as 0xC3 0xA1). But, in order to write them to a file, I first need to convert them to ANSI (it is impossible to make a file in UTF-8 format ... it reads only like ANSI), so my "á" doesn’t become. "Yes, I know that some data will be lost (Chinese characters and nothing at all on the ANSI code page), but this is exactly what I need.

But the fact is that I need code to compile on different platforms, so it must be standard C ++ (i.e. Winapi, only stdlib, stl, crt or any user library with available source code).

Anyone have any suggestions?

+4
source share
2 answers

A few days ago, someone replied that if I had a C ++ 11 compiler, I could try the following:

#include <string> #include <codecvt> #include <locale> string utf8_to_string(const char *utf8str, const locale& loc) { // UTF-8 to wstring wstring_convert<codecvt_utf8<wchar_t>> wconv; wstring wstr = wconv.from_bytes(utf8str); // wstring to string vector<char> buf(wstr.size()); use_facet<ctype<wchar_t>>(loc).narrow(wstr.data(), wstr.data() + wstr.size(), '?', buf.data()); return string(buf.data(), buf.size()); } int main(int argc, char* argv[]) { string ansi; char utf8txt[] = {0xc3, 0xa1, 0}; // I guess you want to use Windows-1252 encoding... ansi = utf8_to_string(utf8txt, locale(".1252")); // Now do something with the string return 0; } 

I don’t know what happened to the answer, apparently someone deleted it. But it turns out that this is the perfect solution. Whoever posted, thanks a lot and you deserve AC and upvote !!

+8
source

If you mean ASCII, just discard any byte that has bit 7 set, this will delete all multibyte sequences. Note that you can create more complex algorithms, such as removing the accent from "á", but this will require much more work.

+1
source

Source: https://habr.com/ru/post/1490628/


All Articles