Reading UTF-8 characters from the console

I am trying to read UTF-8 encoded characters from the console for my C ++ application. I am sure that the console uses this code page (checked in the properties). What I already tried:

  • Using cin - instead of "zażółć" I read "za \ 0 \ 0 \ 0 \ 0"
  • Using wcin - instead of "zażółć" - same result as for cin
  • Using scanf - instead of 'zażółć \ 0' I read 'za \ 0 \ 0 \ 0 \ 0 \ 0'
  • Using wscanf is the same result as scanf
  • Using getchar to read characters one by one is the same result as with scanf

At the beginning of the main function, I have the following lines:

setlocale(LC_ALL, "PL_pl.UTF-8");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);

I would be very grateful for the help.

+4
source share
2 answers

, UTF-8. , :

#include <cstdio>
#include <windows.h>
#define MAX_INPUT_LENGTH 255

int main()
{

    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

    wchar_t wstr[MAX_INPUT_LENGTH];
    char mb_str[MAX_INPUT_LENGTH * 3 + 1];

    unsigned long read;
    void *con = GetStdHandle(STD_INPUT_HANDLE);

    ReadConsole(con, wstr, MAX_INPUT_LENGTH, &read, NULL);

    int size = WideCharToMultiByte(CP_UTF8, 0, wstr, read, mb_str, sizeof(mb_str), NULL, NULL);
    mb_str[size] = 0;

    std::printf("ENTERED: %s\n", mb_str);

    return 0;
}

:

enter image description here

P.S. , !

+3

, , . , , Ive , , . , , ( - , shift-JIS, UTF-8 ), . Microsoft , Ive lib++.

/* Boilerplate feature-test macros: */
#if _WIN32 || _WIN64
#  define _WIN32_WINNT  0x0A00 // _WIN32_WINNT_WIN10
#  define NTDDI_VERSION 0x0A000002 // NTDDI_WIN10_RS1
#  include <sdkddkver.h>
#else
#  define _XOPEN_SOURCE     700
#  define _POSIX_C_SOURCE   200809L
#endif

#include <iostream>
#include <locale>
#include <locale.h>
#include <stdlib.h>
#include <string>

#ifndef MS_STDLIB_BUGS // Allow overriding the autodetection.
/* The Microsoft C and C++ runtime libraries that ship with Visual Studio, as
 * of 2017, have a bug that neither stdio, iostreams or wide iostreams can
 * handle Unicode input or output.  Windows needs some non-standard magic to
 * work around that.  This includes programs compiled with MinGW and Clang
 * for the win32 and win64 targets.
 *
 * NOTE TO USERS OF TDM-GCC: This code is known to break on tdm-gcc 4.9.2. As
 * a workaround, "-D MS_STDLIB_BUGS=0" will at least get it to compile, but
 * Unicode output will still not work.
 */
#  if ( _MSC_VER || __MINGW32__ || __MSVCRT__ )
    /* This code is being compiled either on MS Visual C++, or MinGW, or
     * clang++ in compatibility mode for either, or is being linked to the
     * msvcrt (Microsoft Visual C RunTime) library.
     */
#    define MS_STDLIB_BUGS 1
#  else
#    define MS_STDLIB_BUGS 0
#  endif
#endif

#if MS_STDLIB_BUGS
#  include <io.h>
#  include <fcntl.h>
#endif

using std::endl;
using std::istream;
using std::wcin;
using std::wcout;

void init_locale(void)
// Does magic so that wcout can work.
{
#if MS_STDLIB_BUGS
  // Windows needs a little non-standard magic.
  constexpr char cp_utf16le[] = ".1200";
  setlocale( LC_ALL, cp_utf16le );
  _setmode( _fileno(stdout), _O_WTEXT );
  _setmode( _fileno(stdin), _O_WTEXT );
#else
  // The correct locale name may vary by OS, e.g., "en_US.utf8".
  constexpr char locale_name[] = "";
  setlocale( LC_ALL, locale_name );
  std::locale::global(std::locale(locale_name));
  wcout.imbue(std::locale());
  wcin.imbue(std::locale());
#endif
}

int main(void)
{
  init_locale();

  static constexpr size_t bufsize = 1024;
  std::wstring input;
  input.reserve(bufsize);

  while ( wcin >> input )
    wcout << input << endl;

  return EXIT_SUCCESS;
}

, . , UTF-8 (, UTF-8), , UTF-8 wchar_t <codecvt> <locale>, Windows , . - mbstowcs(). STL, . , , , , UTF-8.

-, , UTF-8 API, Windows, - UTF-16, API. UTF-8 , , . , , UTF-8 , , wchar_t - UCS-32. , , .

+3

Source: https://habr.com/ru/post/1691948/


All Articles