How to handle denormalized data change

What is the best approach for updating a non-indexed regular column (not associated with a primary key) in all tables containing it as a duplicate?

ie the user sends something, and this post is duplicated in many tables for quick retrieval. But when this message changes (with editing), it must be updated throughout the database in all tables containing this record (in tables with different and unknown primary keys).

The solutions I think of:

  • To display the primary keys in all of these tables, display the mapping table, but this seems to cause the tables to explode (the message is not the only property that can change).
  • Use Solr to display, but I'm afraid that I will use it for the wrong purpose.

Any enlightenment will be appreciated.

EDIT (fictional scheme).

What if the message changes? or even username display_name?

CREATE TABLE users ( id uuid, display_name text, PRIMARY KEY ((id)) ); CREATE TABLE posts ( id uuid, post text, poster_id uuid, poster_display_name text tags set<text>, statistics map<int, bigint>, PRIMARY KEY ((id)) ); CREATE TABLE posts_by_user ( user_id uuid, created timeuuid, post text, post_id uuid, tags set<text>, statistics map<int, bigint>, PRIMARY KEY ((id), created) ); 
+5
source share
1 answer

It depends on the frequency of updates. For example, if users rarely update their names infrequently (a few minutes per user account), then it may be useful to use a secondary index. Just be aware that using 2i is a scatter assembly, so you'll see performance issues if this is a general operation. In such cases, you will want to use a materialized view (either in version 3.0, or manage it yourself) in order to be able to get a list of all messages for a given user, and then update the displayed username.

I recommend doing this in the background job and giving the user a message like "it may take [some unit of time] so that your name change is reflected everywhere."

+1
source

Source: https://habr.com/ru/post/1239619/


All Articles