Let's say I have two collections, each of which is independent of each other, but each of them is connected to each other. They are photos and users . There is a one-to-many relationship between users and photos.
An example of denormalized data:
users: { "id": "AABC", "name": "Donna Smith" } photos: { "id": "FAD4", "description": "cute dog", "user_id": "AABC", // This is the relationship "user_name": "Donna Smith" // This is the denormalized value from the "users" collection }
How can I ensure consistency with the documents in the photos collection when an AABC user changes the name from Donna Smith to Donna Chang?
Being non-transactional, I understand that consistency will be possible.
A simple (naive) implementation can initiate a background job after the user changes "AABC" to update all photos where user_id = "AABC". And in the case of a single update, this will work well. But this is a multi-user environment, and at the same time updates will appear in all directions. What if, for example, halfway through a background photo update to change “Donna Smith” to “Donna Chang”, the username “AABC” is changed to “Donna Smith”?
Searching the Internet, I see a lot of discussion on how to model denormalized data. But any discussion about how to maintain it seems trivial, because "you will also need to update all related posts." Are there any NoSQL systems that do the heavy lifting for you in this scenario? Any frameworks or utilities?
I read Thomas Wanschik wonderful blog articles on the subject of "materialized views" and background updates for this particular scenario. But it bothers me that:
- Background jobs should be delayed by a predetermined amount greater than the maximum time allowed for updates (how do I determine the delay? What if the operation takes longer?), And;
- This is the only discussion I found in a practical solution. NoSQL is very believable, why don't I see more of such a discussion? What am I missing?
source share