How to not save data in my reduce () function in MongoDB?

In MongoDB, I'm trying to write Map-Reduce functions that only save data if they meet certain criteria.

I can’t figure out how not to emit () from my gearbox. It always saves data anyway.

Here is a general example. Ignore the data context - I created this data and code solely for the purpose of this question.

Data set:

{ "_id" : ObjectId("52583b3a58da9769dda48853"), "date" : "01-01-2013", "count" : 1 } { "_id" : ObjectId("52583b3d58da9769dda48854"), "date" : "01-01-2013", "count" : 1 } { "_id" : ObjectId("52583b4258da9769dda48855"), "date" : "01-02-2013", "count" : 1 } { "_id" : ObjectId("52583b4f58da9769dda48856"), "date" : "01-03-2013", "count" : 4 } 

Card Function:

 // Map all data by (date, count) var map = function() { var key = this.date; var value = this.count; emit(key, value); } 

A reducer that simply ignores unwanted data.

 // Only save dates which have count > 2 var reducer = function(date, counts) { var sum = Array.sum(counts); if (sum > 2) { return sum; } } 

Results (value 1 was not ignored):

 { "_id" : "01-01-2013", "value" : null } { "_id" : "01-02-2013", "value" : 1 } { "_id" : "01-03-2013", "value" : 4 } 

I also added a return statement to the empty statement, but I got the same results:

 // Only save dates which have count > 2 var reducer = function(date, counts) { var sum = Array.sum(counts); if (sum > 2) { return sum; } else return; } 

What I would like to have is only the following data will exist in my output collection after starting Map-Reduce. How can i do this?

 { "_id" : "01-03-2013", "value" : 4 } 
+6
source share
2 answers

You can perform an additional mapReduce operation with the following functions:

 var second_map = function() { if(this.value > 2) { emit(this._id, this.value); } } 

and

 var second_reduce = function() {} 

The reduction function may be empty, because without having several values ​​per key, in this case it will not even be called .

So by running mapReduce as follows:

 db.map_reduce_example.mapReduce( second_map, second_reduce, {out: 'final_mapreduce_result'}); 

will create the following collection:

 > db.final_mapreduce_result.find() { "_id" : "01-03-2013", "value" : 4 } 

Note that if you decide to use this approach, you can remove the if (sum > 2) condition from the first reduction function.

+3
source

We need to remember that the gearbox can be skipped if the key has only one radiated value (from the map ()). We also should not try to filter the results in the reduction, since the reduction can be caused by multiple times for the same key (each time with a subset of the emitted values).

The only other option is the finalize method, but this will cause null values ​​not to be removed from the result.

I think the only way to get the desired result is to use the aggregation structure instead of reducing the map. The conveyor will look like this:

 db.test.aggregate( { "$project" : { "_id" : 0, "date" : 1, "count" : 1 } }, { "$group" : { "_id" : "$date", "value" : { "$sum" : "$count" } } }, { "$match" : { "value" : { "$gt" : 2 } } } ); { "result" : [ { "_id" : "01-03-2013", "value" : 4 } ], "ok" : 1 } 

The only main aspect of this approach is that the results should be returned to the queue, which limits the size of the results to 16 MB. This will be fixed / fixed in version 2.6: https://jira.mongodb.org/browse/SERVER-10097

NTN, Rob.

+2
source

Source: https://habr.com/ru/post/955799/


All Articles