Discussion Linus - Git versus data corruption?

Question

Discussion Linus - Git versus data corruption?

I watched Linus (the creator of git) give talk on git . At some point, he talks about how git is safer. He also said that other SCMs cannot deal with data corruption. So I went into it and found out that this was not true.

for example, this link says "Replace offensive commit with a new commit as a whole, re-creating about the same changes."

Maybe I misunderstood him, some idea what he had in mind?

He has said many times that git is the only SCM that allows you to verify the same data as you.

-2

git version-control

IAdapter Nov 12 '11 at 13:04

source share

2 answers

sehe · Answer 1 · 2011-11-12T16:17:04+0000

Linus referred to the fact that git performs recognition by their hash function.

Git trees are objects consisting of several (trees, drops) (read: blob = file, roughly).

The cryptographic hash of the parent node in is the hash hash of all source trees / blobs recursively. Such trees are known as Merkle (Hash) Trees and have the interesting property that a top-level hash is a cryptographically strong hash that uniquely identifies the entire tree.

Note that the hash includes commit attributes, and they include parent identifiers. That is, if any file in any revision ever changes, the hash from the blob changes, so the hash with the containing trees changes, the hash of the snapshot changes (the root tree), the hash of the commit changes; the hash of any child needs to be changed, etc. d. The whole story will be changed.

If any of these rules is violated, it will be trivially detectable:

the hash of one tree is deterministically checked in O (n), where n is the number of objects in the root tree
the integrity of the entire history of the branches is deterministically checked in O (n), where n is the number of nodes in the revision chain.

In fact, git-verify-tag , git fsck are useful commands for performing verification explicitly. In addition, verification automatically occurs in git subcommands (send-pack, receive-pack, read-tree, tree-tree, etc.).

Re: Replace protective flow

In this first Linus post, he is already deconstructing / defusing the bomb:

Hm. Scary. It was not supposed to be successful with a corrupt repo.
If you did not make the .grafts file to hide the damage or something else like this?

This is immediately confirmed by Denis Bueno in the response.

Tamás Szelei · Answer 2 · 2011-11-12T13:10:04+0000

I think he meant the fact that git uses a cryptographic hash to ensure the data is correct and that it stores snapshots, not a set of changes. Saying that git is the only SCM that does this is probably an exaggeration today, but that could have been true in the past, before the advent of DVCS systems. Please note that the term “snapshot” does not mean that it stores all files. See this answer for more details .

Discussion Linus - Git versus data corruption?

Re: Replace protective flow

More articles: