Wikipedia, unfortunately, saves each individual revision in the database as XML (?) As text.
Take a look at the wikipedia database schema . In particular, recent changes and text.
Consequently, they have a wonderful O (1) search for the first copy of the biology page. This has an unfortunate side effect that caused wikipedia the cost of the technology for a balloon from $ 8 million USD in 2010-2011 to 12 million US dollars in 2011-2012. This is despite the fact that hard drives (and everything else) are becoming cheaper, not more expensive.
So much for version control storage of each file. Git takes a cute approach. See Is the Git Storage Model Wasteful? .
It stores each file, similar to the method described above. As soon as the space occupied by the repo exceeds a certain limit, he performs a brute force reinstall (it is possible to establish how hard he tries - --window = [N], --depth = [N]), which can take several hours. It uses a combination of delta and lossless compression for the mentioned repack (recursively delta, then applies lossless to any bits that you have).
Others, such as SVN, use simple delta compression. (from a memory you should not trust).
Footnote: Delta compression saves incremental changes. lossless compression is pretty much like zip, rar, etc.
source share