MongoDB MapReduce is much slower than pure Java processing?

I wanted to calculate the entire key of my documents (including embedded) collections. First I wrote a Java client to solve this problem. The result was less than 4 seconds. Then I wrote a map / reduce function. The result was perfect, but it took more than 30 seconds to complete the function! I thought the map / reduce function would be faster, since it would run on the server side. The Java client must retrieve every document from the server, but nonetheless it is much faster. Why is this so?

// Here is my map function:

map = function(){
    for(var key in this) {
      emit(key, {count:1});
      if(isNestedObject(this[key])){
        m_sub(key, this[key]);
      }
    }
}

// Here is my shrinking function:

reduce = function (key, emits) {
    total = 0;
    for (var i in emits) {
        total += emits[i].count;
    }
    return {count:total};
}

// Here is the mapreduce call:

mr = db.runCommand({"mapreduce":"keyword", "map" : map, "reduce" : reduce, 
    "scope":{
        isNestedObject : function (v) {
            return v && typeof v === "object";
        },
        m_sub : function(base, value) {
            for(var key in value) {
              emit(base + "." + key, {count:1});
              if(isNestedObject(value[key])){
                m_sub(base + "." + key, value[key]);
              }
            }
        }
    }
})

// Here is the result:

{
 "result" : "tmp.mr.mapreduce_1292252775_8",
 "timeMillis" : 39087,
 "counts" : {
  "input" : 20168,
  "emit" : 986908,
  "output" : 1934
 },
 "ok" : 1
}

// Here is my Java client:

public static Set<String> recursiv(DBObject o){

        Set<String> keysIn = o.keySet();
        Set<String> keysOut = new HashSet<String>();
        for(String s : keysIn){
            Set<String> keys2 = new HashSet<String>();
            if(o.get(s).getClass().getSimpleName().contains("Object")){
                DBObject o2 = (DBObject) o.get(s);
                keys2 = recursiv(o2);
                for(String s2 : keys2){
                    keysOut.add(s + "." + s2);
                }   
            }else{
                keysOut.add(s);
            } 
        }
        return keysOut;     
    }

    public static void main(String[] args) throws Exception {

        final Mongo mongo =  new Mongo("xxx.xxx.xxx.xxx");
        final DB db = mongo.getDB("keywords");
        final DBCollection keywordTable = db.getCollection("keyword");
        Multiset<String> count = HashMultiset.create();

        long start = System.currentTimeMillis();

        DBCursor curs = keywordTable.find();    
        while(curs.hasNext()){
            DBObject o = curs.next();
            Set<String> keys = recursiv(o);
            for(String s : keys){
                count.add(s);
            }
        }

        long end = System.currentTimeMillis();
        long duration = end - start;

        System.out.println(new SimpleDateFormat("mm:ss:SS").format(Long.valueOf(duration)));              
        System.out.println("duration:" + duration + " ms");
        //System.out.println(count);
        System.out.println(count.elementSet().size());

    }

// Here is the result:

00:03:726
duration:3726 ms
1898

(1934 1898). , , java-. , .

+3
3

, o'reilly mongo , - , , , Mongo , , node. node , .

+9

, mongodb javascript, . Mongodb google v8 javascript engine, , , mongodb / . . http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Parallelism https://jira.mongodb.org/browse/SERVER-2407

+5

framework . , MapReduce, . , , , MapReduce 1-50 .

We chose a design with a segmented collection with the same structure, which allowed us to run small but numerous aggregation tasks, the concept of the aggregation team pipeline works fine.

I also found that the $ group command is very effective, but limiting size and shards limits its use.

+1
source

Source: https://habr.com/ru/post/1780032/


All Articles