Iconv: EILSEQ with ASCII // IGNORE, but not with ASCII // TRANSLIT // IGNORE

Question

Iconv: EILSEQ with ASCII // IGNORE, but not with ASCII // TRANSLIT // IGNORE

Using iconv with //TRANSLIT//IGNORE to convert from utf8 to ascii works fine; it replaces non-convertible characters with the correct transliteration according to the current locale (de_DE in my case):

 > echo 'möp' | iconv -f 'UTF8' -t 'ASCII//TRANSLIT//IGNORE' moep

However, when using //IGNORE without //TRANSLIT it throws an error:

 > echo 'möp' | iconv -f 'UTF8' -t 'ASCII//IGNORE' mp iconv: illegal input sequence at position 5

I wonder why this is happening. Is the input sequence exactly the same and shouldn't //IGNORE just skip invalid characters? When using iconv C api, I get an EILSEQ error - so basically I don’t know if the input string contained invalid UTF8 or not ...

+4

iconv

Thiefmaster Feb 12 '12 at 14:41

source share

1 answer

hnhn · Answer 1 · 2015-08-28T09:25:53+0000

The manual page for iconv (1) on linux says the following:

  -t to-encoding, --to-code=to-encoding Use to-encoding for output characters. If the string //IGNORE is appended to to-encoding, characters that cannot be converted are discarded and an error is printed after conversion.

It skips the character, but also causes an error at the end.

It seems that using // IGNORE, you really cannot distinguish cases with invalid characters in input and non-convertible characters. In other words, EILSEQ and EINVAL situations are handled the same way.

Iconv: EILSEQ with ASCII // IGNORE, but not with ASCII // TRANSLIT // IGNORE

More articles: