How to write a UTF-8 file with fprintf in C ++

I program (just occasionally) in C ++ with VisualStudio and MFC. I am writing a file with fopen and fprintf. The file must be encoded in UTF8. Is there any way to do this? No matter what I try, the file is either double-byte unicode encoded, or ISO-8859-2 (latin2).

Glanebridge

+6
source share
3 answers

Yes, but you need Visual Studio 2005 or later. Then you can call fopen with parameters:

LPCTSTR strText = ""; FILE *f = fopen(pszFilePath, "w,ccs=UTF-8"); _ftprintf(f, _T("%s"), (LPCTSTR) strText); 

Keep in mind that this is a Microsoft extension, it probably will not work with gcc or other compilers.

+2
source

You do not need to set the locale or set any special modes in the file if you just want to use fprintf. You just need to use UTF-8 encoded strings.

 #include <cstdio> #include <codecvt> int main() { std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert; std::string utf8_string = convert.to_bytes(L" 日本国"); if(FILE *f = fopen("tmp","w")) fprintf(f,"%s\n",utf8_string.c_str()); } 

Save the program as UTF-8 with a signature or UTF-16 (i.e. do not use UTF-8 without a signature, otherwise VS will not output the correct string literal). A file written by the program will contain a version of this line of UTF-8. Or you can do:

 int main() { if(FILE *f = fopen("tmp","w")) fprintf(f,"%s\n"," 日本国"); } 

In this case, you should save the file as UTF-8 without a signature, because you want the compiler to think that the source encoding is the same as the executing encoding ... This is a bit of a hack that depends on the compiler, IMO, behavior violation.

You can basically do the same with any other API for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Since the C runtime and / or console is a bit broken, you can write UTF-8 directly to the console by running SetConsoleOutputCP (65001) and then using one of the puts functions.

If you want to use wide characters instead of narrow characters, then locale-based methods and file descriptor setting modes can come into play.

 #include <cstdio> #include <fcntl.h> #include <io.h> int main() { if(FILE *f = fopen("tmp","w")) { _setmode(_fileno(f), _O_U8TEXT); fwprintf(f,L"%s\n",L" 日本国"); } } 

 #include <fstream> #include <codecvt> int main() { if(auto f = std::wofstream("tmp")) { f.imbue(std::locale(std::locale(), new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16 f << L" 日本国\n"; } } 
+2
source

In theory, you should simply set up a locale that uses UTF-8 as external encoding. My understanding - I'm not a Windows programmer - is that Windows does not have such a locale , so you need to resort to specific implementation tools or non- standard libraries (link from Dave's comment).

+1
source

Source: https://habr.com/ru/post/912494/


All Articles