How to maintain denormalized consistency in NoSQL?

Let's say I have two collections, each of which is independent of each other, but each of them is connected to each other. They are photos and users . There is a one-to-many relationship between users and photos.

An example of denormalized data:

 users: { "id": "AABC", "name": "Donna Smith" } photos: { "id": "FAD4", "description": "cute dog", "user_id": "AABC", // This is the relationship "user_name": "Donna Smith" // This is the denormalized value from the "users" collection } 

How can I ensure consistency with the documents in the photos collection when an AABC user changes the name from Donna Smith to Donna Chang?

Being non-transactional, I understand that consistency will be possible.

A simple (naive) implementation can initiate a background job after the user changes "AABC" to update all photos where user_id = "AABC". And in the case of a single update, this will work well. But this is a multi-user environment, and at the same time updates will appear in all directions. What if, for example, halfway through a background photo update to change “Donna Smith” to “Donna Chang”, the username “AABC” is changed to “Donna Smith”?

Searching the Internet, I see a lot of discussion on how to model denormalized data. But any discussion about how to maintain it seems trivial, because "you will also need to update all related posts." Are there any NoSQL systems that do the heavy lifting for you in this scenario? Any frameworks or utilities?

I read Thomas Wanschik wonderful blog articles on the subject of "materialized views" and background updates for this particular scenario. But it bothers me that:

  • Background jobs should be delayed by a predetermined amount greater than the maximum time allowed for updates (how do I determine the delay? What if the operation takes longer?), And;
  • This is the only discussion I found in a practical solution. NoSQL is very believable, why don't I see more of such a discussion? What am I missing?
+5
source share
1 answer

My early understanding of NoSQL was a true cost analysis when delivering huge amounts of data to a user / application.

When you return your photos in your application, what happens more often? Delivering photos back to the user and possibly their friends who are viewing them ... or changing the username?

Since changing the username is a less common instance in the application, NoSQL Denormalization claims to be famous because you can deliver hi-speed gobs of photo data back to users without the cost of JOINs in a traditional normalized / RDBMS environment.

Using a few tools these days (since you wrote it a long time ago) can help in such situations, but you are essentially correct in the fact that you can schedule code changes to handle this ... it will slow ... it will be expensive ... but it will work ... and you will still have the benefits of speed of delivering your photos to the application, which is essentially the main purpose of your application.

This question turns into an epic novel that has SQL defenders on the one hand and NoSQL followers "rabble" on the other. The traditional DBA shudders at the thought of compromising the structure for speed, but thinks of NoSQL as the old Super Table concept long ago, when we used to think about what would be returned compared to what needs to be saved. Essentially ... this is what spawned the NoSQL concept, and it is very useful in large-scale applications and large data reports.

I know this is an old question, but I still hope that my answer will help others, such as me, to demystify the benefits of NoSQL when it comes to this type of question.

0
source

Source: https://habr.com/ru/post/1241674/


All Articles