I have several MongoDB collections that take multiple JSON documents from various streaming sources. In other words, there are a number of processes that constantly insert data into the MongoDB collection of collections.
I need a way to transfer data from MongoDB to applications located downstream. So I want the system to look like this:
App Stream1 --> App Stream2 --> MONGODB ---> Aggregated Stream App Stream3 -->
Or that:
App Stream1 --> ---> MongoD Stream1 App Stream2 --> MONGODB ---> MongoD Stream2 App Stream3 --> ---> MongoD Stream3
The question is, how can I transfer data from Mongo without requiring constant polling / querying the database?
The obvious answer to the question is βwhy donβt you change these application threading streams for sending messages to a queue, such as Rabbit, Zero or ActiveMQ, which then sends them to your Mongo Streaming and Mongo streams right awayβ:
MONGODB /|\ | App Stream1 --> | ---> MongoD Stream1 App Stream2 --> SomeMQqueue ---> MongoD Stream2 App Stream3 --> ---> MongoD Stream3
In an ideal world, yes, that would be nice, but we need Mongo to ensure that messages are saved first, to avoid duplication, and to ensure that all identifiers are generated, etc. Mongo should sit in the middle like a persistence layer.
So, how can I transfer messages from the Mongo collection (without using GridFS, etc.) to these streaming applications. The main school of thought was simply to interrogate new documents, and each collected document updated it by adding another field to the JSON documents stored in the database, just like the process flag in the SQL table in which the processed timestamp is stored. That is, every 1st survey for documents where == null .... add is processed is processed = now () .... update the document.
Is there a tidier / more computationally efficient method?
FYI is all Java processes.
Hurrah!