Iconv: EILSEQ with ASCII // IGNORE, but not with ASCII // TRANSLIT // IGNORE

Using iconv with //TRANSLIT//IGNORE to convert from utf8 to ascii works fine; it replaces non-convertible characters with the correct transliteration according to the current locale (de_DE in my case):

 > echo 'möp' | iconv -f 'UTF8' -t 'ASCII//TRANSLIT//IGNORE' moep 

However, when using //IGNORE without //TRANSLIT it throws an error:

 > echo 'möp' | iconv -f 'UTF8' -t 'ASCII//IGNORE' mp iconv: illegal input sequence at position 5 

I wonder why this is happening. Is the input sequence exactly the same and shouldn't //IGNORE just skip invalid characters? When using iconv C api, I get an EILSEQ error - so basically I don’t know if the input string contained invalid UTF8 or not ...

+4
source share
1 answer

The manual page for iconv (1) on linux says the following:

  -t to-encoding, --to-code=to-encoding Use to-encoding for output characters. If the string //IGNORE is appended to to-encoding, characters that cannot be converted are discarded and an error is printed after conversion. 

It skips the character, but also causes an error at the end.

It seems that using // IGNORE, you really cannot distinguish cases with invalid characters in input and non-convertible characters. In other words, EILSEQ and EINVAL situations are handled the same way.

+1
source

Source: https://habr.com/ru/post/1396042/


All Articles