Best practice for mysql data version control system

I need to store data like articles in a mysql database, and if the article is changed, I also need to save the old version so that I can restore it. I found several similar questions and posts on this topic, but I'm not sure which solution is the best solution to the problem.

Here is a basic table of โ€œarticlesโ€ for a better understanding:

articles (id, name, text)

There are two different approaches for me:

Approach 1

Save the data and each version of the article in the "articles" table and add the "version" and "status" columns. In the version I kept the version number with the version extension. The active article receives the status "1", and the rest - "status" 2.

Pro's:

  • Only one table required

  • The new version is an insertion of new data and only an update of the "status" -column of the old

Con's

  • Very large tables (possibly slower queries ???)

Approach 2

Add the "version" field to the "articles" and save only the active data in the "articles" table. Old versions of the data are saved / moved to the new table "articles_versioned".

Pro's:

  • Only actual actual data is given in the table "Articles"

Con's

  • Publishing Tables

So. I forgot a good approach? How to handle related data in other tables (e.g. images, etc.)?

+6
source share
1 answer

My choice will be a variation of approach 2. Fields in the primary key are marked in bold.

  • You insert each article into the articles_versioned table ( id , timestamp , name, text)
  • The second table is articles ( id , timestamp, [name, text]). Note that the timestamp is not primary; name and text can be replicated, or you can use a connection to articles_versioned (which will be fast since id and timestamp are articles_versioned ).
  • articles_versioned has an insert trigger that takes the row just inserted and replicates it to articles
  • To restore a specific version of an article, you modify the articles table.

The advantages of this approach are as follows:

  • You get free other information (the date and time of the article) in your table, which you may need in any case
  • You do not need to query the database to get the current date. If you are using a version, you need to.
  • Your code should not insert an article in two tables. You just paste articles_versioned and read from articles , db takes care of the data transfer when you paste it with a trigger, avoiding consistency issues.

Con's

  • In a highly simultaneous environment, two versions can be inserted at the same time, so one of them may fail. This should not be a problem when inserting user-written articles (these are very unlikely these days, given the accuracy of timestamps). If you do not specify a timestamp in your INSERT expression, but instead set the datetime field to the current time as the default, you can completely avoid this problem.

To answer your remaining question. Approach 1 will not lead to longer queries if you add a status index. This only makes sense if you have many different versions of each article; as long as you have 2 versions per article on average or less, the index will only slow you down, and approach 2 will not be reasonably faster anyway (although I would still recommend my approach because it just encodes the code, since version recovery does not require a switch state for two lines).

Related resources, such as images, should follow similar versioning. I assume that you save them to the file system; instead of saving them with their real name, use the table ( id , image_name) to give each image an identifier, and then save the image as -id-.jpg . The image_name field will let you know what the original file name is (if you're interested). Thus, you can reproduce images in the same way as articles with a version, and in articles you should use something like <img src="-id-.jpg"> , which, as you know, will remain available forever.

+3
source

Source: https://habr.com/ru/post/950535/


All Articles