Mongodb $ all and $ is very slow even in indexed fields

I have a collection of approximately 80 million documents, each of which stores an array of tags in the tags field, for example:

 {text: "blah blah blah...", tags: ["car", "auto", "automobile"]} 

The tags field is indexed, so naturally such queries are almost instantaneous:

  db.documents.find({tags:"car"}) 

However, the following queries run very slowly in minutes:

  db.documents.find({tags:{$all:["car","phone"]}}) db.documents.find({tags:{$in:["car","auto"]}}) 

The problem persists even if the array has only one element:

  db.documents.find({tags:{$all:["car"]}}) //very slow too 

I thought $ all and $ in should work very quickly because tags indexed, but apparently this is not the case. Why?

+4
source share
2 answers

It turns out this is a known bug in MongoDB that has not yet been fixed since 2.2

MongoDB does not cross index when looking for multiple records using $all . Only the first element of the array is scanned using indexes, and all matched documents are checked to filter the results.

For example, in a query db.documents.find({tags:{$all:["car","phone"]}}) you need to download and scan all documents containing the tag "car". Since the collection in question contains more than one hundred thousand documents labeled β€œcar,” the slowdown is not surprising.

Worse, MongoDB does not even perform a simple optimization of selecting the least represented element in the $ all array to search for an index. If there are 100,000 documents labeled β€œcar” and 10 documents labeled β€œphone”, MongoDB will still have to scan 100,000 documents to return results for {$all:["car", "phone"]}

See also: https://jira.mongodb.org/browse/SERVER-1000

+10
source

I just want to add $ in quickly. In fact, in just 1 criterion or keyword, $ in is equivalent to $ all, but $ in is fast and $ is slower.

So use $ in.

0
source

Source: https://habr.com/ru/post/1438165/


All Articles