EDIT - Request performance:
As @NeilLunn pointed out in his comments, you should not filter documents manually, but use .find(...) instead:
db.snapshots.find({ roundedDate: { $exists: true }, stream: { $exists: true }, sid: { $exists: false } })
In addition, using .bulkWrite() , available as from MongoDB 3.2 , will be much more efficient than performing separate updates.
It is possible that with this you can fulfill your request within 10 minutes of the cursorβs life. If it still takes longer than you, your cursor will expire, and in any case you will have the same problem, which is explained below:
What's going on here:
Error: getMore command failed can be caused by the cursor timeout associated with two cursor attributes:
The default timeout limit is 10 minutes. From the docs :
By default, the server will automatically close the cursor after 10 minutes of inactivity or if the client has exhausted the cursor.
The batch size, which is 101 documents or 16 MB for the first batch, and 16 MB, regardless of the number of documents, for subsequent batches (compared to MongoDB 3.4 ). From the docs :
Operationsfind() and aggregate() by default have an initial batch size of 101 documents. Subsequent getMore operations issued against the received cursor do not have a default lot size, so they are limited only by the message size of 16 megabytes.
You probably consume these first 101 documents and then get a package of 16 MB in size, which is the maximum with a lot more documents. Since it takes more than 10 minutes to process them, the cursor on the server expires, and by the time you finish processing documents in the second batch and request a new one , the cursor is already closed:
When you repeat the cursor and get to the end of the returned batch, if there are more results, cursor.next () will perform the getMore operation to retrieve the next batch.
Possible solutions:
I see five possible ways to solve this problem: 3 are good, with their pros and cons and 2 are bad:
π Decreasing the batch size to save the cursor.
π Delete the timeout from the cursor.
π Try again when the cursor expires.
π Request results manually.
π Receive all documents before the cursor expires.
Please note that they are not numbered according to any specific criteria. Read them and decide which one is best for your particular case.
1. π Decrease batch size to keep cursor
One way to solve this problem is to use cursor.bacthSize to set the batch size on the cursor returned by your find query to match what you can process during these 10 minutes:
const cursor = db.collection.find() .batchSize(NUMBER_OF_DOCUMENTS_IN_BATCH);
However, keep in mind that setting a very conservative (small) batch size will probably work, but it will also be slower, since now you need more time to access the server.
On the other hand, setting the value too close to the number of documents that you can process in 10 minutes means that it is possible that if some iterations take a little longer to process for any reason (other processes may consume more resources), the cursor has expired, and you will get the same error again.
2. π Delete timeout with cursor
Another option is to use cursor.noCursorTimeout to prevent the timer from turning off:
const cursor = db.collection.find().noCursorTimeout();
This is considered bad practice, since you will need to manually close the cursor or exhaust all its results so that it closes automatically:
After setting the noCursorTimeout parameter noCursorTimeout you must either manually close the cursor with cursor.close() , or have exhausted the results of the cursors.
As you want to process all the documents in the cursor, you will not need to close it manually, but it is still possible that something else does not work out in your code, and before you finish, you get an error, the cursor is open.
If you still want to use this approach, use try-catch to make sure you close the cursor if something goes wrong before you use all of its documents.
Note. I do not consider this a bad decision (therefore π), since I even thought it was considered bad practice ...:
This is a feature supported by the driver. If this was so bad, since there are alternative ways to get around timeout problems, as explained in other solutions, this will not be supported.
There are ways to use it safely, it's just a matter of caution with it.
I assume that you do not regularly perform such requests, so the likelihood that you start to leave open cursors everywhere is low. If this is not the case, and you really need to deal with these situations all the time, then it makes sense not to use noCursorTimeout .
3. π Try again when the cursor expires
Basically, you put your code in a try-catch , and when you get an error, you get a new cursor that skips already processed documents:
let processed = 0; let updated = 0; while(true) { const cursor = db.snapshots.find().sort({ _id: 1 }).skip(processed); try { while (cursor.hasNext()) { const doc = cursor.next(); ++processed; if (doc.stream && doc.roundedDate && !doc.sid) { db.snapshots.update({ _id: doc._id }, { $set: { sid: `${ doc.stream.valueOf() }-${ doc.roundedDate }` }}); ++updated; } } break;
Note that you need to sort the results for this solution.
With this approach, you minimize the number of requests to the server using the maximum possible packet size of 16 MB, without knowing how many documents you can process 10 minutes before. Therefore, it is also more stable than the previous approach.
4. π Request manual results manually
Basically, you use skip () , limit () and sort () to execute multiple queries with multiple documents, which, in your opinion, can be processed in 10 minutes.
I think this is a bad decision, because the driver already has the ability to set the batch size, so there is no reason for this manually, just use solution 1 and do not reinvent the wheel.
In addition, it is worth mentioning that it has the same disadvantages as solution 1,
5. π Receive all documents before the cursor expires
Your code probably takes some time to execute due to processing the results, so first you can get all the documents and then process them:
const results = new Array(db.snapshots.find());
This will load all batches one by one and close the cursor. Then you can view all the documents inside the results and do what you need to do.
However, if you encounter problems with a timeout, most likely your result set is quite large, so pulling everything out in memory may not be the most appropriate task.
Pay attention to the snapshot mode and duplicate documents
It is possible that some documents are returned several times if intermediate write operations move them due to an increase in the size of the document. To solve this problem, use cursor.snapshot() . From the docs :
Add the snapshot () method to the cursor to switch the snapshot mode. This ensures that the query will not return the document several times, even if intermediate write operations move the document due to the increase in the size of the document.
However, keep in mind its limitations:
It does not work with lined collections.
It does not work with sort() or hint() , so it will not work with solutions 3 and 4.
This does not guarantee isolation from insertion or removal.
Note that in solution 5, the time window for moving documents, which can cause documents to be retrieved, is narrower than with other solutions, so you may not need snapshot() .
In your particular case, since the collection is called snapshot , it is unlikely to change, so you probably won't need snapshot() . In addition, you make document updates based on your data and, as soon as the update is completed, the same document will not be updated again, even if it is retrieved several times, since the if condition will skip it.
Note about open cursors
To see the number of cursors open, use db.serverStatus().metrics.cursor .