The "best" way to keep a history of changes depends on your specific goals / limitations - and you did not mention them.
But here are some thoughts on your two suggested methods:
create one table for messages and one for message history, for example:
create table posts ( id int primary key, userid int ); create table posthistory ( postid int, revisionid int, content varchar(1000), foreign key (postid) references posts(id), primary key (postid, revisionid) );
(Obviously there will be more columns, foreign keys, etc.) It is simple to implement and easy to understand (and it is easy to allow the DBMS to maintain referential integrity), but, as you mentioned, there posthistory too many lines in the posthistory to search fast enough
Note that postid is the foreign key in posthistory (and PK posts ).
- Use a denormalized schema where all the latest revisions are in the same table, and previous revisions are in a separate table. This requires more logic from the program side, that is,
when add a new version, replace the post with the same id in the post table, and also add this to the revision table .
(This may be what the SE sites use based on the data dump in the SE Data Explorer . Or maybe not, I canβt say.)
With this approach, postid also the foreign key in the posthistory table, and the primary key of the posts table.
source share