Poll MongoDb on indexed field vs Tailable Cursor

The MongoDb documentation on tail cursors says the following:

If your query is in an indexed field, do not use tail cursors, but use a regular cursor instead. Keep track of the last indexed field returned by the query. To retrieve newly added documents, query the collection again using the last indexed field value in the query criteria

I configure the query to search for all documents after a certain period of time, and then save the returned documents as they are inserted. I assume that the easiest way to do this is to request on _id (assuming we use the ObjectIds as we are) for something $ gt that I want.

Since _id is indexed by default, how bad is it constantly polling MongoDb with the last _id I received and keep asking for something in $$? I understand that it will only be accuracy after 1 second or so, since ObjectIds only store seconds from the era, but I can live with it, so I assume that I will query at least once a second.

I think I'm just surprised that the documentation recommends an approach to the query (presumably constantly in my case) compared to keeping the italic cursor open: would I think pushing would be cheaper than pulling?

+6
source share
4 answers

There is a big caveat here that I think you may not have paid attention. Limited cursors only work for private collections . Using a private collection is probably not a general-purpose solution, so careful planning will be required to ensure that the collection is sized appropriately to account for data size and growth.

+1
source

If you go with tail cursors, there are a few issues that I can think of:

  • You must receive every message in the collection before we reach the end.
  • You need to go back to the beginning if you ever run out of cursor (and its wait delay await_data). So, in case of application reload, db reload, etc. You do not have any option, but iteration from the very beginning.

In addition to the above, there are a few extra caveats using the tail cursor, given the fact that they only work for limited collections.

  • Scalability limitation with the number of connections. Each client connection will add a stream of connections on the mongod (or mongos) servers.
  • Cropped collections have a fixed maximum size. Documents cannot exceed this size.
  • You cannot seal a limited collection.
  • Any document updates in a closed collection should not cause a document to grow. (i.e. not all $set operations will work, not $push or $pushAll )
  • You cannot explicitly .remove() documents from a private collection
  • You have no control over deleting a document from the collection. It will act as a circular queue.

how bad is it to constantly try MongoDb with the last _id I received and keep asking for things $ gt?

IMO, the survey introduces latency and unnecessary waiting, even if there are no updates, but you have a lot under your control.

Performance is reasonable, there should be no problem if you use an indexed field for queries.

0
source

It looks like you want to be notified of new / updated / deleted objects in the database. This is not possible with a Mongodomb without the slightest deception. I assume you read about reading oplogs using tail cursors, and polling is always an absolute last resort. I have never tried them because they seem a bit limited (cannot use them in common db environments) and are unreliable - not to mention complicated setup (requires a set of replicas) and are subject to change at any time in the future without warning. For example, some popular mongo-watch library is no longer supported in better alternatives).

DB mutation events are implemented in some databases: Postgres implements triggers, and RethinkDB actually pushes you to change. If you can switch to something like RethinkDB, that would be ideal.

If not, my best advice for you is to set a service level in front of your db through which all traffic should go. Client applications can connect to these services through sockets (which is trivial with socket.io - implemented in almost every language). Each time your service level processes updates, inserts, or deletes, you can pass these events on to someone who is currently connected.

Limitations with this approach

  • All db messages must pass through the service layer.

Cautions with this approach

  • If something updates the database directly, you will not see these changes immediately. You will have to request db again. Not the end of the world.

Benefits of using this approach

  • This is better, more effective and in real time than a survey.
  • You have a level of service that can do much more business data with your data, for example, when transferring data, changing data, checking data, sending letters, registering, updating other data sources, etc .;)
  • This is a paradigm that works with any language, any db.
  • There are lightweight frameworks that already implement this architecture. FeathersJS is my favorite. You really have to check it out. If you can use NodeJS, you should at least make it clear how feathers work.
0
source

The suggested answers are already great and accurate. However, when I first read your question and problem, and maybe I don’t quite understand what exactly you are trying to do, it sounds to me like this problem / solution was built for Redis . It would be quite simple to set a cache for receiving / receiving information, you could access it and delete information as necessary from the cache.

Also, the number of read / write operations and, of course, other database operations will remain normal, as you will poll the cache.

Again, maybe I did not understand the problem correctly, but I configured Redis correctly, and using it seems to be the way to go in this situation. It looks like this was done to answer the cache.

0
source

Source: https://habr.com/ru/post/1015553/


All Articles