Mongodb groupby slow even after adding index

I have a simple collection:

{ "_id" : ObjectId("5033cc15f31e20b76ca842c8"), "_class" : "com.pandu.model.alarm.Alarm", "serverName" : "CDCAWR009 Integration Service", "serverAddress" : "cdcawr009.na.convergys.com", "triggered" : ISODate("2012-01-28T05:09:03Z"), "componentName" : "IntegrationService", "summary" : "A device which is configured to be recorded is not being recorded.", "details" : "Extension<153; 40049> on CDCAWR009 is currently not being recorded properly; recording requested for the following reasons: ", "priority" : "Major" } 

the collection will have about two million such documents. I am trying to group by server name and get the number of all server names. Sounds easy in terms of RDBMS queries.

 The query that I have come up with is db.alarm.group( {key: { serverName:true }, reduce: function(obj,prev) { prev.count++ }, initial: { count: 0 }}); 

In addition, I added an index to serverName.

 > db.alarm.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.alarm", "name" : "_id_" }, { "v" : 1, "key" : { "serverName" : 1 }, "ns" : "test.alarm", "name" : "serverName_1" } ] 

However, I get a response in mongodb after 13 seconds. whereas in the sql server a similar query is returned back within 4 seconds, which is also without an index.

Is there something I am missing?

Thanks pending.

+3
mongodb
Aug 21 '12 at 20:31
source share
2 answers

As you can see from the query you wrote, this type of aggregation in 2.0 requires you to run Map / Reduce. Map / Reduce on MongoDB has some performance penalties that have been covered on SO before - basically, if you cannot parallelize in a cluster, you will run single-threaded javascript through Spidermonkey - not a quick suggestion. The index, since you are not selective, does not really help - you just need to scan the entire index, as well as potentially the document.

With imminent version 2.2 (currently in rc1 at the time of this writing) you have some options. The aggregation structure (which is native and not based on JS Map / Reduce), presented in 2.2, has a built-in group operator and was created specifically to speed up such operations in MongoDB.

I would recommend giving a 2.2 shot and see if your grouping performance improves. I think it will look something like this (note: not verified):

 db.alarm.aggregate( { $group : { _id : "$serverName", count : { $sum : 1 } }} ); 
+4
Aug 21 '12 at 21:23
source share

Another option and, perhaps, the most effective solution at the moment can be to use the distinct () command and to calculate the results on the client side. http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

+2
Aug 21 '12 at 21:25
source share



All Articles