Mongodb $ all and $ is very slow even in indexed fields

Question

Mongodb $ all and $ is very slow even in indexed fields

I have a collection of approximately 80 million documents, each of which stores an array of tags in the tags field, for example:

 {text: "blah blah blah...", tags: ["car", "auto", "automobile"]}

The tags field is indexed, so naturally such queries are almost instantaneous:

  db.documents.find({tags:"car"})

However, the following queries run very slowly in minutes:

  db.documents.find({tags:{$all:["car","phone"]}}) db.documents.find({tags:{$in:["car","auto"]}})

The problem persists even if the array has only one element:

  db.documents.find({tags:{$all:["car"]}}) //very slow too

I thought $ all and $ in should work very quickly because tags indexed, but apparently this is not the case. Why?

+4

mongodb

ramirami Oct 6 '12 at 14:57

source share

2 answers

ramirami · Answer 1 · 2012-10-06T16:49:01+0000

It turns out this is a known bug in MongoDB that has not yet been fixed since 2.2

MongoDB does not cross index when looking for multiple records using $all . Only the first element of the array is scanned using indexes, and all matched documents are checked to filter the results.

For example, in a query db.documents.find({tags:{$all:["car","phone"]}}) you need to download and scan all documents containing the tag "car". Since the collection in question contains more than one hundred thousand documents labeled “car,” the slowdown is not surprising.

Worse, MongoDB does not even perform a simple optimization of selecting the least represented element in the $ all array to search for an index. If there are 100,000 documents labeled “car” and 10 documents labeled “phone”, MongoDB will still have to scan 100,000 documents to return results for {$all:["car", "phone"]}

See also: https://jira.mongodb.org/browse/SERVER-1000

J. Chang · Answer 2 · 2012-11-14T10:34:41+0000

I just want to add $ in quickly. In fact, in just 1 criterion or keyword, $ in is equivalent to $ all, but $ in is fast and $ is slower.

So use $ in.

Mongodb $ all and $ is very slow even in indexed fields

More articles: