In my Linux program being developed in C with ncurses, I need to read stdin in UTF-8 encoding. However, when I do this:
wint_t unicode_char=0;
get_wch(&unicode_char);
I get a wide character encoded in utf-16 (I see this when I unload a variable using gdb). I don't want to convert it from utf-16 to utf-8, I want the input to be in UTF-8 all the time, no matter which Linux distribution runs my program with any other language that the user has configured. How it's done? Is it possible?
EDIT : Here is an example of a source and evidence that get_wch uses UTF-16 (which is the same as UTF-32) and not UTF-8, despite the fact that I configured the input source of UTF-8 using setlocale ().
[niko@dev1 ncurses]$ gcc -g -o getch -std=c99 $(ncursesw5-config --cflags --libs) getch.c
[niko@dev1 ncurses]$ cat getch.c
#define _GNU_SOURCE
#include <locale.h>
#include <ncursesw/ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int ct;
wint_t unichar;
int main(int argc, char *argv[])
{
setlocale(LC_ALL, "");
initscr();
raw();
keypad(stdscr, TRUE);
ct = get_wch(&unichar);
mvprintw(24, 0, "Key pressed is = %4x ", unichar);
refresh();
getch();
endwin();
return 0;
}
Testing code with GDB:
🔎
Breakpoint 1, main (argc=1, argv=0x7fffffffded8) at getch.c:18
18 mvprintw(24, 0, "Key pressed is = %4x ", unichar);
Missing separate debuginfos, use: dnf debuginfo-install ncurses-libs-5.9-21.20150214.fc23.x86_64
(gdb) print unichar
$1 = 128270
(gdb) print/x ((unsigned short*) (&unichar))[0]
$2 = 0xf50e
(gdb) print/x ((unsigned short*) (&unichar))[1]
$3 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[0]
$4 = 0xe
(gdb) print/x ((unsigned char*) (&unichar))[1]
$5 = 0xf5
(gdb) print/x ((unsigned char*) (&unichar))[2]
$6 = 0x1
(gdb) print/x ((unsigned char*) (&unichar))[3]
$7 = 0x0
(gdb)
- 🔎, UTF-8 "f09f948e", : http://www.fileformat.info/info/unicode/char/1f50e/index.htm
UTF8 get_wch()? , , ?
P.S.
, '-lncursesw', '-lncurses' ,