How to save all versions of messages in mysql database

It is popular to save all versions of messages when editing (for example, in stackexchange projects), since we can restore old versions. Interestingly, the best way to save all versions.

Method 1. Save all versions in one table and add a column for the order or active version. This will make the table too long.

Method 2. Create an archive table to store older versions.

In both methods, it is interesting how it works with the row identifier, which is the main identifier of the article.

+4
source share
4 answers

The "best" way to keep a history of changes depends on your specific goals / limitations - and you did not mention them.

But here are some thoughts on your two suggested methods:

  • create one table for messages and one for message history, for example:

    create table posts ( id int primary key, userid int ); create table posthistory ( postid int, revisionid int, content varchar(1000), foreign key (postid) references posts(id), primary key (postid, revisionid) ); 

(Obviously there will be more columns, foreign keys, etc.) It is simple to implement and easy to understand (and it is easy to allow the DBMS to maintain referential integrity), but, as you mentioned, there posthistory too many lines in the posthistory to search fast enough

Note that postid is the foreign key in posthistory (and PK posts ).

  • Use a denormalized schema where all the latest revisions are in the same table, and previous revisions are in a separate table. This requires more logic from the program side, that is, when add a new version, replace the post with the same id in the post table, and also add this to the revision table .

(This may be what the SE sites use based on the data dump in the SE Data Explorer . Or maybe not, I can’t say.)

With this approach, postid also the foreign key in the posthistory table, and the primary key of the posts table.

+9
source

In my opinion, an interesting approach is

  • to define another table, for example posts_archive (it will contain all the columns of the posts table + automatically generated primary key + optional date ...)
  • to feed this table through triggers after insertion and after update defined in the posts table.
+2
source

If table size is a problem, then the second option would be a better choice. Thus, the active version can be quickly returned from a smaller table, and restoring the old version from a larger archive table is considered longer. However, table size should not be a problem with a reasonable database and indexing.

In any case, you need a primary key consisting of several columns of the table, and not just the row identifier. The trivial answer was to include a timestamp containing the time during which each revision was created in the key, so that the identifier continues to identify a specific article, and the identifier and editing time together identify a specific revision of the article.

+2
source

Working with temporary data is a known issue.

Method 1 simply changes your table identifier: in the end, you will get a table with messageID, version, description, ... with the primary key messageID, version . Changing data is done by adding a row with an incremental version. The request is a bit more complicated.

Method 2 is more tedious, in the end you will get a table with a rowID table and a second table that will be exactly the same as in method 1. Then, with each update, you will have to remember to copy the data to the "backup table".

Method 3: answser specified by Matt

In my opinion, methods 1 and 3 are better. The schema is simpler in 1, but you can have untranslated data for your messages using method 3.

+2
source

Source: https://habr.com/ru/post/1395600/


All Articles