C - Avoiding diacritic / accent issues

Question

C - Avoiding diacritic / accent issues

I am creating a tiny guessing program for the capitals of countries. Some capitals have accents, cedillas, etc.

Since I have to compare the capital and text that the user guessed, and I do not want to spoil the comparison, I went to dig on the Internet to somehow do it.

I came across countless solutions for other programming languages, but only a few results about C.

None of them worked with me. Although, I came to the conclusion that I would have to use the wchar.h library to work with these annoying characters.

I made this tiny bit of code (which replaces É with E) to test this method, and against everything I read, and realize that it does not work, even when printing a wide char string, diacritics are not displayed. If this works, I’m sure I can implement it in the capital program, so I would appreciate it if someone would tell me what happened.

#include<stdio.h>
#include<locale.h>
#include<wchar.h>

const wchar_t CAPITAL_ACCUTE_E = L'\u00C9';

int main()
{
    wchar_t wbuff[128];
    setlocale(LC_ALL,"");
    fputws(L"Say something: ", stdout);
    fgetws(wbuff, 128, stdin);
    int n;
    int len = wcslen(wbuff);
    for(n=0;n<len;n++)
        if(wbuff[n] == CAPITAL_ACCUTE_E)
            wbuff[n] = L'E';
    wprintf(L"%ls\n", wbuff);
    return 0;
}

+4

c diacritics wchar

Cláudio pinto Jul 17 '16 at 21:02

source share

1 answer

a3f · Answer 1 · 2016-07-17T23:04:42+0000

You do not notice what Écan be represented as

É- LATIN CAPITAL LETTER E WITH ACUTE , code U + 00C9 ( c3 89in UTF-8) or
É - LATIN CAPITAL LETTER E, , codepoints U + 0045 U + 0301 (45 cc 81 UTF-8)

. , NFD ( : ). E, strcmp, .

, UTF-8 input, utf8proc:

#include <utf8proc.h>

utf8_t *output;
ssize_t len = utf8proc_map((uint8_t*)input, 0, &output, 
                           UTF8PROC_NULLTERM | UTF8PROC_STABLE |
                           UTF8PROC_STRIPMARK | UTF8PROC_DECOMPOSE |
                           UTF8PROC_CASEFOLD
                          );

É, É E E.

C - Avoiding diacritic / accent issues

More articles: