I wanted to calculate the entire key of my documents (including embedded) collections. First I wrote a Java client to solve this problem. The result was less than 4 seconds. Then I wrote a map / reduce function. The result was perfect, but it took more than 30 seconds to complete the function! I thought the map / reduce function would be faster, since it would run on the server side. The Java client must retrieve every document from the server, but nonetheless it is much faster. Why is this so?
// Here is my map function:
map = function(){
for(var key in this) {
emit(key, {count:1});
if(isNestedObject(this[key])){
m_sub(key, this[key]);
}
}
}
// Here is my shrinking function:
reduce = function (key, emits) {
total = 0;
for (var i in emits) {
total += emits[i].count;
}
return {count:total};
}
// Here is the mapreduce call:
mr = db.runCommand({"mapreduce":"keyword", "map" : map, "reduce" : reduce,
"scope":{
isNestedObject : function (v) {
return v && typeof v === "object";
},
m_sub : function(base, value) {
for(var key in value) {
emit(base + "." + key, {count:1});
if(isNestedObject(value[key])){
m_sub(base + "." + key, value[key]);
}
}
}
}
})
// Here is the result:
{
"result" : "tmp.mr.mapreduce_1292252775_8",
"timeMillis" : 39087,
"counts" : {
"input" : 20168,
"emit" : 986908,
"output" : 1934
},
"ok" : 1
}
// Here is my Java client:
public static Set<String> recursiv(DBObject o){
Set<String> keysIn = o.keySet();
Set<String> keysOut = new HashSet<String>();
for(String s : keysIn){
Set<String> keys2 = new HashSet<String>();
if(o.get(s).getClass().getSimpleName().contains("Object")){
DBObject o2 = (DBObject) o.get(s);
keys2 = recursiv(o2);
for(String s2 : keys2){
keysOut.add(s + "." + s2);
}
}else{
keysOut.add(s);
}
}
return keysOut;
}
public static void main(String[] args) throws Exception {
final Mongo mongo = new Mongo("xxx.xxx.xxx.xxx");
final DB db = mongo.getDB("keywords");
final DBCollection keywordTable = db.getCollection("keyword");
Multiset<String> count = HashMultiset.create();
long start = System.currentTimeMillis();
DBCursor curs = keywordTable.find();
while(curs.hasNext()){
DBObject o = curs.next();
Set<String> keys = recursiv(o);
for(String s : keys){
count.add(s);
}
}
long end = System.currentTimeMillis();
long duration = end - start;
System.out.println(new SimpleDateFormat("mm:ss:SS").format(Long.valueOf(duration)));
System.out.println("duration:" + duration + " ms");
System.out.println(count.elementSet().size());
}
// Here is the result:
00:03:726
duration:3726 ms
1898
(1934 1898). , , java-.
, .