Guessing UTF-8 Encoding

I have a question that can be quite naive, but I feel the need to ask because I really don't know what is going on. I'm on Ubuntu.

Suppose i

echo "t" > test.txt

if i then

file test.txt

I get test.txt:ASCII text

If I do then

echo "å" > test.txt

Then i get

test.txt: UTF-8 Unicode text

How does this happen? How does the file “know” the encoding, or, alternatively, how does it guess it?

Thanks.

+3
source share
4 answers

, , UTF-8 (. Wikipedia). file , UTF-8, , UTF-8. , . ASCII ( , 't') ( UTF-8), , ASCII, file , ASCII, . ASCII .

, UTF-8, UTF-8 . , , UTF-16,

echo "å" > test.txt

UTF-16.

+4

manpage:

, , . ASCII, ISO-8859-x, -ISO 8- ASCII (, Macintosh IBM PC), UTF-8 Unicode, Unicode UTF-16 EBCDIC , . , . ASCII, ISO-8859-x, UTF-8 -ASCII "" ; UTF-16 EBCDIC - " " , , , . , . CR, CRLF NEL Unix-standard LF, . , .

+4

UTF-8 "ASCII-friendly", , , ASCII, , , ASCII UTF-8.

: , 256 ASCII. 128. ISO-8859-x - , 128 ASCII, - .

, UTF-8 , , 1 , 2, 3 4 - 4- , 3 2 . 1- 0 127, 128 255.

- -UTF-8 (, UTF-16) UTF-8, . ASCII, , , UTF-8.

, UTF-8 , "" UTF-8, -, , UTF-8.

+3

.

BOM (Byte-Oder Mark) ( , / )

, . 2 ( , 4 5 ).

.


:

, .

UTF-8 , , ASCII , UTF-8 - ASCII. UTF-8!

, , UTF-8 .


!

, , . ? , .

+2

Source: https://habr.com/ru/post/1717393/


All Articles