How to avoid unintentionally encoding UTF-8 files as ASCII / ANSI?

When editing a file encoded as UTF-8 without [false] specifications, the contents may lose any Unicode characters outside the ASCII or ANSI ranges. The next time you open the file again, some text editors (Notepad ++) interpret it as ASCII / ANSI encoded and open it as such. Unaware of the change, the user will continue editing, now adding Unicode characters other than ANSI that look useless since they are stored in ANSI. A menu option may exist (Notepad ++) to open ANSI files as UTF-8 without specification, but leading to the inverse problem of inadvertently overriding Unicode ANSI files.

+3
source share
3 answers

One way is to add a character outside the ANSI range in the comment in the file. Depending on the decoding algorithm, it may cause the editor (Notepad ++) to recognize the file as encoded in UTF-8 without specification.

In an HTML document, for example, you can follow the charset definition in the header with such a Unicode comment, here U + 05D0 HEBREW LETTER ALEF: <meta http-equiv = "Content-Type" content = "text / html; charset = utf-8" > <! - א →

+2
source

How do you suggest the editor indicate the difference between ASCII / ANSI and UTF-8 without specs when the files look the same?

, UTF-8 UTF-8, , UTF-8.

+2

Configure your editor to always use UTF-8, if possible, if not, complain about the creators of your editor. Non-Unicode encodings are IMO, are outdated and should be treated as such.

Files using only characters in ASCII (7-bit) space will be the same in UTF-8 anyway, so if you want to transmit something in ASCII encoding, just don’t enter unicode characters.

+2
source

Source: https://habr.com/ru/post/1725996/


All Articles