UTF-8 in ASCII using the ICU library

I have std :: string with UTF-8 characters.
I want to convert a string to its closest equivalent with ASCII characters.

For instance:

Lodz => Lodz
Assunção => Assuncao
Schloß => Schloss

Unfortunately, the ICU library is unrealistic, and I did not find good documentation on its use, so it would take me too much time to learn how to use it. Time that I don’t have.

Can someone give a small example on how this can be done?
thanks.

+4
source share
5 answers

I don’t know about the ICU, but ICONV does this and is pretty easy to recognize. it's only about 3-4 calls, and in your case you need to use the ICONV_SET_TRANSLITERATE flag using iconvctl() .

+3
source

Try this, ucnv_convert ("US-ASCII", "UTF-8", targeting, targetize, source, sourcesize, pError)

+3
source

I wrote a callback that decomposes and then does some replacement. This can probably be implemented as transliteration. the code is decompcb.c here, and the title is next. Install it in a Unicode-ASCII converter as follows:

 ucnv_setFromUCallBack(gConverter, &UCNV_FROM_U_CALLBACK_DECOMPOSE, &status); 

then use gConverter to convert from Unicode to ASCII

+1
source

This is not an area in which I am an expert, but if you do not have a convenient library that makes this easy for you, then it might be best for you to create a table / search map that contains UTF-8 → ASCII values. i.e. The key is UTF-8 char, the value is a sequence of ASCII characters.

0
source

Decomposition ß-> ss tells me that you want to decompose compatibility. In ICU, for this you need the Normalizer class. Subsequently, you will have something like L'odz '. From this line, you can simply remove non-ASCII characters. There is no need for an ICU, a normal STL will do.

0
source

Source: https://habr.com/ru/post/1277517/


All Articles