How to optimize work with large data sets in MongoDB

We have several collections of approximately 10,000 documents (this will become more and more in the future) that are generated in node.js , and they need to be stored / requested / filtered / designed for some time, for which we have a mongodb aggregation pipeline . After fulfilling certain conditions, documents are restored and saved.

Everything worked fine when we had 5,000 documents. We inserted them as an array into one document and used unwind in the aggregation pipeline. However, at a certain point, documents no longer fit into one document because it exceeds the size limit of a document of 16 MB. We had to store everything in bulk and add some identifiers in order to know which "collection" they belong to, so we can use the pipeline only for these documents.

Problem: Writing files, which is necessary before we can request them in the pipeline, is problematically slower. The bulk.execute() may take 10-15 seconds. Adding them to an array in node.js and writing a document <16 MB in MongoDB takes only a fraction of a second.

 bulk = col.initializeOrderedBulkOp(); for (var i = 0, l = docs.length; i < l; i++) { bulk.insert({ doc : docs[i], group : group.metadata }); } bulk.execute(bulkOpts, function(err, result) { // ... } 

How can we eliminate excess overhead delay?


Thoughts so far:

  • A memory-based assembly that temporarily processes requests when data is written to disk.
  • The if Memory Storage Engine (Alert: considered beta, not production) figure is licensed under MongoDB Enterprise.
  • Perhaps the WiredTiger storage engine has improvements over MMAPv1 , in addition to compression and encryption.
  • Saving one (array) document in any case, but break it into fragments of 16 MB in size.
+5
source share

Source: https://habr.com/ru/post/1246974/


All Articles