Strategy to keep individual databases in sync

I have a NoSQL database that we use to process data, since it can be used for my application faster than SQL. I treat our NoSQL database almost like an information cache, with SQL being the data authority, and the NoSQL repository is updated with changes. Now this is done through our application, so when the query comes for a change, it is created in the SQL database and NoSQL database. Sometimes this fails, because sometimes a NoSQL update fails, or other situations cause the NoSQL database to crash.

I can perform a batch update every X minutes, however, the data stores have a lot of information, and it will take several hours to synchronize them. We have timestamps to make a difference in what has been changed, but this is not always accurate.

I am wondering what is the recommended strategy for keeping the data warehouse (secondary database cache) in sync with my primary warehouse?

+4
source share
1 answer

I know that I did this with messaging in the past - in particular JMS with ActiveMQ. I sent updates to NoSQL repository (Mongo) using the queue. Thus, messages can accumulate in the queue, and if the connection to the NoSQL repository was ever broken, it could take the place where it stopped.

It worked very well because ActiveMQ was really stable and easy to use.

I always saw how this was done with the differences, as you mentioned. You enter date fields all over, and then keep track of the latest synchronization. The best part about this approach is that it easily allows you to rename transactions by changing the last synchronization date.

One of the last tips ... write good tools around transferring data from point A to point B (in this case, SQL in NoSQL). I wrote several tools for bulk loading NoSQL storage from SQL at my last job, and it made life easy if something really didn't sync. I could always recover between scripts and bulk upload processes.

+5
source

Source: https://habr.com/ru/post/1494590/


All Articles