MongoDB - a collection of another collection?

I have a process in which I am currently using the Mongo Map / Reduce framework, but it does not work very well. This is a fairly simple aggregation, where I bucketize over 3 fields, returning the sum of 4 different fields and going through the values ​​for 4 more fields (which are constant in each bucket).

For the reasons described in [ Map-Reduce Performance in MongoDb 2.2, 2.4, and 2.6 ], I would like to convert this to an aggregation structure for better performance, but there are three things that stand in the way, I think:

  • The overall result can be large, exceeding the Mongo 16MB limit, although any document as a result is very small.
  • I can directly convert / demolish to another collection, but the aggregation structure can only return inline results (I think?)
  • For incremental updates, as more data arrives in the original collection, I can display / reduce using MapReduceCommand.OutputType (in Java) installed in REDUCE , exactly coinciding with my use case, but I do not see the corresponding function in the aggregation structure.

Are there any good ways to solve these problems in the aggregation structure? Now the server is version 2.4.3 - perhaps we will update it if there are new features.

+4
source share
2 answers

Currently, the aggregation structure cannot be displayed directly in another collection. However, you can try the answer in this discussion: the SO-questions-output aggregate for the new Mapreduce collection is slower, and I was also waiting for a solution. You can try the Hadoop connector for Mongodb, which is supported on the mongodb website. Hadoop is faster on mapreduce. But I do not know if this would be good in your particular case.

Hadoop + MongoDB Connector Link

All the best.

+1
source

Now you can do it with $ out , as described in mongo

$ out Accepts documents returned by the aggregation pipeline and writes them to the specified collection. The $ out operator allows the aggregation structure to return result sets of any size. The $ out statement should be the last step in the pipeline.

The command has the following syntax: a collection that will contain the output of the aggregation operation. $ out is only valid at the end of the pipeline:

 db.<collection>.aggregate( [ { <operation> }, { <operation> }, ..., { $out : "<output-collection>" } ] ) 
+3
source

Source: https://habr.com/ru/post/1487314/


All Articles