What is character encoding?

I need to clear a file containing french text. The problem is that files mistakenly contain multiple encodings in a single file.

I think some sections are ISO8859-1 (Latin 1), but other parts have text encoded in single-byte characters that look like "extended" ASCII. In other words, this is UTF-7 encoding plus the following:

  • 0x82 for é (e sharp)
  • 0x8a for è (e grave)
  • 0x88 for ê (e circumflex)
  • 0x85 for à (grave)
  • 0x87 for ç (c cedilla)

What is encoding?

+3
source share
2 answers

This is the original IBM PC encoding, Page Code 437 .

+6
source

- 0x87 cedilla. , , .

0

Source: https://habr.com/ru/post/1742505/


All Articles