Multi-index index slow range request

I have a MongoDB collection named post with 35 million objects. A collection has two secondary indexes, defined as follows.

 > db.post.getIndexKeys() [ { "_id" : 1 }, { "namespace" : 1, "domain" : 1, "post_id" : 1 }, { "namespace" : 1, "post_time" : 1, "tags" : 1 // this is an array field } ] 

I expect the following query, which simply filters namespace and post_time , to run in a reasonable amount of time without scanning all objects.

 >db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count() 7408 

However, MongoDB needs at least ten minutes to get the result, and with curiosity, he manages to scan 70 million objects to complete the task according to the explain function.

 > db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain() { "cursor" : "BtreeCursor namespace_1_post_time_1_tags_1", "isMultiKey" : true, "n" : 7408, "nscannedObjects" : 69999186, "nscanned" : 69999186, "nscannedObjectsAllPlans" : 69999186, "nscannedAllPlans" : 69999186, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 378967, "nChunkSkips" : 0, "millis" : 290048, "indexBounds" : { "namespace" : [ [ "my_namespace", "my_namespace" ] ], "post_time" : [ [ ISODate("2013-04-09T00:00:00Z"), ISODate("292278995-01--2147483647T07:12:56.808Z") ] ], "tags" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "localhost:27017" } 

The difference between the number of objects and the number of checks should be caused by the lengths of the tag arrays (which are 2). However, I do not understand why the post_time filter post_time not use an index.

Can you tell me what I can lose?

(I work on a descent machine with 24 cores and 96 GB of RAM. I use MongoDB 2.2.3.)

+6
source share
1 answer

Found the answer in this question: Order $ lt and $ gt in a MongoDB range request

My index is a multi-index (on tags ), and I run a range query (on post_time ). Apparently MongoDB cannot use both sides of the range as a filter in this case, so it just selects the $gte , which $gte first, since my lower limit is the lowest post_time value, MongoDB starts scanning all objects.

Unfortunately, this is not the whole story. Trying to solve the problem, I created non-multicid indexes, but MongoDB insisted on using the bad. It made me think that the problem was elsewhere. Finally, I had to abandon the multicode index and create it without the tags field. Now everything is all right.

+3
source

Source: https://habr.com/ru/post/944622/


All Articles