Unknown characters

I read a line from a file encoded "UTF-8". And I need to match this with an expression. The first character of the file is # , but the first character in the string is '' (empty character). I converted it to "UTF-8" encoded bytes, here it is [-17, -69, -65] . Does anyone know what it is and how to solve it with regexprs?

+3
source share
1 answer

Some editors (for example, notepad) add BOM (byte bytes) signatures while saving UTF-8 text. You should check 0xEF, 0xBB, 0xBF bytes before reading a line from such a file and skip them if they exist.

Another way is not to use notepad to edit UTF-8 texts, get another program, such as Notepad ++, Kate or something else, with which you can control the addition of the specification.

+6
source

Source: https://habr.com/ru/post/1273212/


All Articles