I need to clear a file containing french text. The problem is that files mistakenly contain multiple encodings in a single file.
I think some sections are ISO8859-1 (Latin 1), but other parts have text encoded in single-byte characters that look like "extended" ASCII. In other words, this is UTF-7 encoding plus the following:
- 0x82 for é (e sharp)
- 0x8a for è (e grave)
- 0x88 for ê (e circumflex)
- 0x85 for à (grave)
- 0x87 for ç (c cedilla)
What is encoding?
source
share