Why do I lose some values ​​every 100 documents?

I am trying to understand the behavior with map / reduce.

Here's the display function:

function() { var klass = this.error_class; emit('klass', { model : klass, count : 1 }); } 

And reduction function:

 function(key, values) { var results = { count : 0, klass: { foo: 'bar' } }; values.forEach(function(value) { results.count += value.count; results.klass[value.model] = 0; printjson(results); }); return results; } 

Then I ran it:

 { "count" : 85, "klass" : { "foo" : "bar", "Twitter::Error::BadRequest" : 0 } } { "count" : 86, "klass" : { "foo" : "bar", "Twitter::Error::BadRequest" : 0, "Stream:DirectMessage" : 0 } } 

At this point, everything is fine, but here is the result of holding a read lock every 100 documents:

 { "count" : 100, "klass" : { "foo" : "bar", "Twitter::Error::BadRequest" : 0, "Stream:DirectMessage" : 0 } } { "count" : 100, "klass" : { "foo" : "bar", "undefined" : 0 } } 

I saved my foo key, and the count attribute continued to grow. The problem is that everything else has become undefined .

So, why am I losing dynamic keys for my object while my count attribute is still good?

+4
source share
1 answer

What you need to remember about your reduction function is that the values ​​passed to it are either the result of your map function or the return value of previous calls to decrease.

This is the key - this means that the display / reduction of part of the data can be processed on different machines (for example, different fragments of the mongo cluster), and then reduce the use again for reassembling the data. This also means that mongo does not have to display each value first, storing all the results in memory, and then decrement all of them: it can display and reduce chunks, decreasing if necessary.

In other words, the following should be true:

 reduce(k,[A,B,C]) == reduce(k, [A, reduce(k,[A,B])) 

The output of the reduction function does not have the model property, therefore, if it will be used when re-decreasing undefined values.

You either need the reduction function to return something similar in the format that your map function emits, so that you can process two without distinction (usually the easiest) or otherwise process the re-reduced values ​​differently.

0
source

Source: https://habr.com/ru/post/1435540/


All Articles