What is the most compact storage method different in the database?

Question

What is the most compact storage method different in the database?

I want to implement something similar to the Wikimedia change history? What will be the best PHP functions / libraries / extensions / algorithms?

I would like the differences to be as compact as possible, but I'm glad to limit only to showing the difference between each revision and her sibling and only the ability to roll back one revision at a time.

In some cases, only a few characters can change, while in other cases the whole line can change, so I want to understand if some methods are better for small changes than for large ones, and if in some cases it is more to just store whole copies.

Backing up an entire system using something like Git or SVN seems a bit extreme, and I really don't want to store files on disk.

+4

version-control php mysql

Tim Feb 09 '12 at 19:15

source share

3 answers

You have to ask yourself: what type of data end-user will request more often: revisions or version differences? For this, I would use the standard diff from unix. And, depending on the answer to the above question, save the differences or entire versions in the database.

System-wide support with something like Git or SVN seems a bit extreme

Why? Github, AFAIR, thus stores the wiki;)

+2

wikp Feb 09 '12 at 19:23

source share

I would run it with diff to create delta and patch , to apply one or more editable sequences to create a document in a known state. Of course, the more you do this, the more it becomes clear that you can offload this task into a version control tool. I redesigned the diff / patch systems twice to use SVN for this type of task.

0

Duane gran Feb 09 '12 at 19:32

source share

Francis avila · Accepted Answer · 2012-02-09T19:37:37+0000

It is much easier to store each entire record than to store them. Then, if you want a spread of the two versions, you can generate them as needed using the PECL Text_Diff library .

I like to store all versions of a record in a single table and retrieve the last of them using MAX(revision) , the "current" logical attribute, or similar. Others prefer to denormalize and have a mirror table that contains inaccurate versions.

If you save diff instead, your circuitry and algorithms become much more complex. Then you need to save at least one “full” revision and several versions of “diff”, and also restore the full version from the set of differences whenever you need the full version. (This is how SVN stores things. Git stores a complete copy of each revision, not diff.)

Programmer time is expensive, but disk space is usually cheap. Please consider whether the problem of saving each revision is really a problem.

What is the most compact storage method different in the database?

More articles: