Git: Diff doesn't handle character encoding other than UTF-8?

Created a repo, added files with UTF8 and Latin2 encoding with this content:

árvíztűrő tükörfúrógép ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP 

See https://github.com/bimlas/git-test/commit/872370caf91f1faaf931c1228c797f3d10d6435d

The output of git log -p 82904e60 :

 commit 82904e60d1940c036c8190e2a41de6b423727a7c Author: BimbaLaszlo < bimbalaszlo@gmail.com > Date: Mon Jul 27 14:38:35 2015 +0200 initial commit diff --git a/fileencoding/latin2.txt b/fileencoding/latin2.txt new file mode 100644 index 0000000..7165bc9 --- /dev/null +++ b/fileencoding/latin2.txt @@ -0,0 +1,2 @@ +<E1>rv<ED>zt<FB>r<F5> t<FC>k<F6>rf<FA>r<F3>g<E9>p^M +<C1>RV<CD>ZT<DB>R<D5> T<DC>K<D6>RF<DA>R<D3>G<C9>P^M diff --git a/fileencoding/utf8.txt b/fileencoding/utf8.txt new file mode 100644 index 0000000..80e1878 --- /dev/null +++ b/fileencoding/utf8.txt @@ -0,0 +1,2 @@ +árvíztűrő tükörfúrógép^M +ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP^M 

I have git the same output on Linux and Windows (where my language is Latin2). I tried without a pager ( git --no-pager log -p 82904e60 ), got the same results without escape codes:

 commit 82904e6 Author: BimbaLaszlo < bimbalaszlo@gmail.com > Date: 2015-07-27 14:38:35 +0200 initial commit diff --git a/fileencoding/latin2.txt b/fileencoding/latin2.txt new file mode 100644 index 0000000..7165bc9 --- /dev/null +++ b/fileencoding/latin2.txt @@ -0,0 +1,2 @@ + rv zt r  t k rf r g p + RV ZT R  T K RF R G P diff --git a/fileencoding/utf8.txt b/fileencoding/utf8.txt new file mode 100644 index 0000000..80e1878 --- /dev/null +++ b/fileencoding/utf8.txt @@ -0,0 +1,2 @@ +árvíztűrő tükörfúrógép +ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP 

The latin2.txt log is the same, so the problem is not caused by mixing files with different encoding in one output.

How to configure git to print characters the way they should appear even without a pager?

EDIT

I think the problem is not terminal related, for example, in Windows PowerShell, the latin2.txt file is fine, but utf8.txt is weird:

Same coding with different output

+5
source share
1 answer

Git doesn't care about character encodings at all. A file is just a bunch of bytes.

The mapping is done by your terminal. If it is configured to decode as UTF-8, your latin-2 file seems to be broken. If it is configured to decode as Latin-2, the UTF-8 file seems to be broken.

Perhaps the encoding attribute (see git help gitattributes ) may give some tools a hint on how to decode the file correctly, but I never used this. For example, github may be smart enough to look at this attribute and decode these files in different ways.

+2
source

Source: https://habr.com/ru/post/1246680/


All Articles