How does a version control system restore a revision?

My question is more general than the one declared in the title.

I know that version control only stores information about differences. As I understand it, so is Wikipedia, as well as github.

But everyone has the opportunity to show the entire file with a specific version. Do they gradually restore it from the first version to a certain one?

And one more question. If they only store differences, as they show them in ui with context (some text before and after the changes).

EDIT: github stores whole snapshots instead of delta

+6
source share
3 answers

I know that version control only stores information about differences.

Since Git's question is a constructive solution for storing content, not differences , this is not exactly what Git does.
However, it has a “packaged” format to store objects in a delta form using the binary delta from the LibXDiff library. But it is mainly used for network transmission.
See " Is the Git Binary Algorithm (Delta Storage) Standardized? ".
This is why Git is a “ delta resolution ” upon receipt.

+6
source

For a very interesting read on the pros and cons of various ways of storing version control data, I highly recommend reading Eric Sank's article, Communication of Time and Space in Version Control Storage .

Storage is one of the most difficult tasks for version control of a system. For each file, we must store every version that has ever existed. The logical size of the version control repository has never been compressed. It just continues to grow and grow, and every old version should remain available.

So what is the best way to store each version of everything?

+4
source

Wikipedia, unfortunately, saves each individual revision in the database as XML (?) As text.

Take a look at the wikipedia database schema . In particular, recent changes and text.

Consequently, they have a wonderful O (1) search for the first copy of the biology page. This has an unfortunate side effect that caused wikipedia the cost of the technology for a balloon from $ 8 million USD in 2010-2011 to 12 million US dollars in 2011-2012. This is despite the fact that hard drives (and everything else) are becoming cheaper, not more expensive.

So much for version control storage of each file. Git takes a cute approach. See Is the Git Storage Model Wasteful? .

It stores each file, similar to the method described above. As soon as the space occupied by the repo exceeds a certain limit, he performs a brute force reinstall (it is possible to establish how hard he tries - --window = [N], --depth = [N]), which can take several hours. It uses a combination of delta and lossless compression for the mentioned repack (recursively delta, then applies lossless to any bits that you have).

Others, such as SVN, use simple delta compression. (from a memory you should not trust).

Footnote: Delta compression saves incremental changes. lossless compression is pretty much like zip, rar, etc.

+3
source

Source: https://habr.com/ru/post/916835/


All Articles