MongoDB converts missing data with a null value

So strange. I am trying to use mapreduce to group datetime / metrics under a unique port:

Document Layout:

{ "_id" : ObjectId("5069d68700a2934015000000"), "port_name" : "CL1-A", "metric" : "340.0", "port_number" : "0", "datetime" : ISODate("2012-09-30T13:44:00Z"), "array_serial" : "12345" } 

and mapreduce functions:

 var query = { 'array_serial' : array, 'port_name' : { $in : ports }, 'datetime' : { $gte : from, $lte : to} } var map = function() { emit( { portname : this.port_name } , { datetime : this.datetime, metric : this.metric }); } var reduce = function(key, values) { var res = { dates : [], metrics : [], count : 0} values.forEach(function(value){ res.dates.push(value.datetime); res.metrics.push(value.metric); res.count++; }) return res; } var command = { mapreduce : collection, map : map.toString(), reduce : reduce.toString(), query : query, out : { inline : 1 } } mongoose.connection.db.executeDbCommand(command, function(err, dbres){ if(err) throw err; console.log(dbres.documents); res.json(dbres.documents[0].results); }) 

If a small number of records are requested, say 5 or 10 or even 60, I get all the data back that I expect. Large queries return truncated values ​​....


I just did a few more tests and seemed to limit the output of the record to 100? This is small data, and when I run the request for a 24-hour period, I would expect 1440 entries ... I just ran it 80.: \

Is this expected? I do not indicate a restriction wherever I say ...


Additional data:

The query for records from 2012-10-01T23: 00 - 2012-10-02T00: 39 (100 minutes) is returned correctly:

 [ { "_id": { "portname": "CL1-A" }, "value": { "dates": [ "2012-10-01T23:00:00.000Z", "2012-10-01T23:01:00.000Z", "2012-10-01T23:02:00.000Z", ...cut... "2012-10-02T00:37:00.000Z", "2012-10-02T00:38:00.000Z", "2012-10-02T00:39:00.000Z" ], "metrics": [ "1596.0", "1562.0", "1445.0", ...cut... "774.0", "493.0", "342.0" ], "count": 100 } } ] 

... add one more minute to the request 2012-10-01T23: 00 - 2012-10-02T00: 39 (101 minutes):

 [ { "_id": { "portname": "CL1-A" }, "value": { "dates": [ null, "2012-10-02T00:40:00.000Z" ], "metrics": [ null, "487.0" ], "count": 2 } } ] 

the dbres.documents object shows the correct expected emitted records:

 [ { results: [ [Object] ], timeMillis: 8, counts: { input: 101, emit: 101, reduce: 2, output: 1 }, ok: 1 } ] 

... so the data gets lost somewhere?

+4
source share
2 answers

Rule number one of MapReduce:

You will return from Reduce the same format that you emit using your key on the map.

Rule No. 2 MapReduce:

You must reduce the array of values ​​passed to reduce as many times as possible. The reduction function can be called many times.

You have violated both of these rules in your implementation of the reduction.

The function of your card emits a key, pairs of values.

key: port name (you just have to specify the name as a key, not a document)
value: a document representing the three things you need to accumulate (date, metric, quantity)

Try this instead:

 map = function() { // if you want to reduce to an array you have to emit arrays emit ( this.port_name, { dates : [this.datetime], metrics : [this.metric], count: 1 }); } reduce = function(key, values) { // for each key you get an array of values var res = { dates: [], metrics: [], count: 0 }; // you must reduce them to one values.forEach(function(value) { res.dates = value.dates.concat(res.dates); res.metrics = value.metrics.concat(res.metrics); res.count += value.count; // VERY IMPORTANT reduce result may be re-reduced }) return res; } 
+10
source

Try displaying card reduction data in the temp collection, not in memory. This is the reason. From Mongo Docs :

{inline: 1} - with this option, the collection will not be created and the entire operation to reduce the scale will occur in RAM. In addition, the results from map-reduce will be returned to the result object. Please note that this option is only possible when the result set is placed within the 16 MB limit of a single document. In version 2.0, this is your only option available on a secondary replica.

Also, this may not be the reason, but MongoDB has data size limits (2 GB) on a 32-bit machine.

+1
source

Source: https://habr.com/ru/post/1438120/


All Articles