Extract correct text from wifstream regardless of encoding

Here is the program: http://codepad.org/eyxunHot
File Encoding - UTF-8.

I have a text file called "config.ini" with the following word: ➑ball

If I use notepad to save a file encoded with "UTF-8", then run the program, according to the debugger, the value is eight_ball: ï "¿âball

If I use notepad to save the file with the "Unicode" encoding, then run the program, according to the debugger, the value is eight_ball: ÿþ'b

If I use notepad to save the file with the Unicode Big End encoding, then run the program, according to the debugger, the value is eight_ball: Thy

In all these cases, the result is incorrect. ANSI encoding also does not support the character .. How can I make sure that the word ➑ball will be extracted from the file when I go config_file → eight_ball, regardless of the encoding? I want the output of this program to be "Program is correct" regardless of the encoding config.ini.

+3
source share
3 answers

If you are on Windows and want to use INI files, keep in mind that the INI-APIs support Unicode (UTF-16 little endian) INI files without problems, you just need to provide an empty start specification file.

, ++ Unicode, . UTF8, , ++.

+1

, ICU.

Windows UTF. Ubuntu UTF-8 , , Unix ++. char * UTF-8 ( 2 ). .

+1

You need to set the locale before wstreams will work correctly. Instead, I suggest using regular streams and some library for character conversion, since your input encoding will usually vary. These days, the best algorithm is to try to read UTF-8 first, and if that doesn't work, try reading like CP1252 or some other custom encoding that is customizable.

0
source

Source: https://habr.com/ru/post/1732721/


All Articles