C - Avoiding diacritic / accent issues

I am creating a tiny guessing program for the capitals of countries. Some capitals have accents, cedillas, etc.

Since I have to compare the capital and text that the user guessed, and I do not want to spoil the comparison, I went to dig on the Internet to somehow do it.

I came across countless solutions for other programming languages, but only a few results about C.

None of them worked with me. Although, I came to the conclusion that I would have to use the wchar.h library to work with these annoying characters.

I made this tiny bit of code (which replaces É with E) to test this method, and against everything I read, and realize that it does not work, even when printing a wide char string, diacritics are not displayed. If this works, I’m sure I can implement it in the capital program, so I would appreciate it if someone would tell me what happened.

#include<stdio.h>
#include<locale.h>
#include<wchar.h>

const wchar_t CAPITAL_ACCUTE_E = L'\u00C9';

int main()
{
    wchar_t wbuff[128];
    setlocale(LC_ALL,"");
    fputws(L"Say something: ", stdout);
    fgetws(wbuff, 128, stdin);
    int n;
    int len = wcslen(wbuff);
    for(n=0;n<len;n++)
        if(wbuff[n] == CAPITAL_ACCUTE_E)
            wbuff[n] = L'E';
    wprintf(L"%ls\n", wbuff);
    return 0;
}
+4
source share
1 answer

You do not notice what Écan be represented as

. , NFD ( : ). E, strcmp, .

, UTF-8 input, utf8proc:

#include <utf8proc.h>

utf8_t *output;
ssize_t len = utf8proc_map((uint8_t*)input, 0, &output, 
                           UTF8PROC_NULLTERM | UTF8PROC_STABLE |
                           UTF8PROC_STRIPMARK | UTF8PROC_DECOMPOSE |
                           UTF8PROC_CASEFOLD
                          );

É, É E E.

+1

Source: https://habr.com/ru/post/1648175/


All Articles