Reading Unicode Files

I have a problem reading and using content from Unicode files.

I am working on building a unicode release, and I am trying to read the contents from a unicode file, but the data has strange characters, and I cannot find a way to convert the data to ASCII.

I use fgets. I have tried fgetws, WideCharToMultiByteand many features that I have found in other articles and reports, but nothing happened.

+3
source share
6 answers

Since you mention WideCharToMultiByte, I assume that you are dealing with Windows.

"read contents from unicode file ... find a way to convert data to ASCII"

. Unicode ASCII ( ), / . unicode release, Unicode Unicode.

, wchar_t ( WCHAR, CStringW, ).

, utf-16 utf-8 (utf-32 ). utf-16 endianess. , .

:

  • wopen _wfopen
  • utf-8, wchar_t WideCharToMultiByte CP_UTF8
  • utf-16be (big endian), wchar_t _swab
  • - utf-16le ( ), wchar_t,

( Visual Studio), MS _wfopen. (- _wfopen(L"newfile.txt", L"rw, ccs=<encoding>"); UTF-8 UTF-16LE). .

: - , wchar_t 2 4 , ...

:

+7

(, Unicode char wchar_t? ?), , , Unicode, fgetws .

- Unicode , , . , Unicode ( mbtowc). , Unicode ( wctomb).

+1

Unicode - . Unicode - : consequtive ? , big-endian, little-endian - .

BOM ( ) : FF FE, FE FF.

+1

- .

.

, ++, fgets fgetws, IOStreams; ++ C?

C:

#include <locale.h>
setlocale(LC_ALL, ""); /* at least LC_CTYPE */

++

#include <locale>
std::locale::global(std::locale(""));

IO (wstream, fgetws) , Unicode. , ( Windows, Unix, LC_ALL - , . locale -a ). , , locale , , , .

, ++ . .

+1

-: , Unicode UTF8 ( ). , , Notpad ++

- - . QT, QFile Unicode ( ).

, , , : http://utfcpp.sourceforge.net/.

unicode: http://en.wikipedia.org/wiki/Unicode. unicode.

0

Unicode, UTF-8, ASCII. ( "" ) ASCII - Unicode .

0

Source: https://habr.com/ru/post/1712788/


All Articles