I have a MongoDB collection named post with 35 million objects. A collection has two secondary indexes, defined as follows.
> db.post.getIndexKeys() [ { "_id" : 1 }, { "namespace" : 1, "domain" : 1, "post_id" : 1 }, { "namespace" : 1, "post_time" : 1, "tags" : 1 // this is an array field } ]
I expect the following query, which simply filters namespace and post_time , to run in a reasonable amount of time without scanning all objects.
>db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count() 7408
However, MongoDB needs at least ten minutes to get the result, and with curiosity, he manages to scan 70 million objects to complete the task according to the explain function.
> db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain() { "cursor" : "BtreeCursor namespace_1_post_time_1_tags_1", "isMultiKey" : true, "n" : 7408, "nscannedObjects" : 69999186, "nscanned" : 69999186, "nscannedObjectsAllPlans" : 69999186, "nscannedAllPlans" : 69999186, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 378967, "nChunkSkips" : 0, "millis" : 290048, "indexBounds" : { "namespace" : [ [ "my_namespace", "my_namespace" ] ], "post_time" : [ [ ISODate("2013-04-09T00:00:00Z"), ISODate("292278995-01--2147483647T07:12:56.808Z") ] ], "tags" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "localhost:27017" }
The difference between the number of objects and the number of checks should be caused by the lengths of the tag arrays (which are 2). However, I do not understand why the post_time filter post_time not use an index.
Can you tell me what I can lose?
(I work on a descent machine with 24 cores and 96 GB of RAM. I use MongoDB 2.2.3.)