Mongodb request is exceptionally slow

My mongodb is pretty simple: a dataset / record has about 30 properties on three levels. One such entry has about 5,000 characters. I have 500 thousand of them. When I execute the following request ...

db.images.find({ "featureData.cedd": { $exists: false}}).count() 

... he is very slow. It is not indexed, but still ... from my MySQL experience, it doesn't take 20 minutes to execute one such query.

When executed (directly on the mongo terminal), 3% of the CPU is used and another 2 GB of free memory.

Thanks for giving me a clue what I can do!

EDIT: Explanation () of the request (excluding) gives:

 db.images.find({ "featureData.cedd": { $exists: false }}).explain() { "cursor" : "BasicCursor", "nscanned" : 532537, "nscannedObjects" : 532537, "n" : 438, "millis" : 1170403, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } 

Output signal iostat:

 Linux 3.2.0-58-generic (campartex) 03/25/2014 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 34.93 0.01 0.25 0.48 0.00 64.33 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 2.08 103.79 11.26 172805914 18749067 fd0 0.00 0.00 0.00 148 0 

Explanation () output after adding index :

 db.images.find({ "featureData.cedd": { $exists: false }}).explain() { "cursor" : "BtreeCursor featureData.cedd_1", "nscanned" : 438, "nscannedObjects" : 438, "n" : 438, "millis" : 2, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "featureData.cedd" : [ [ null, null ] ] } } 
+4
source share
2 answers

In my case, adding an index improved the query speed by 600,000 times. $exists:false searches for null values ​​- this only works efficiently if other objects do not have cedd:null very often (as a valid value). Here. In addition, objects that have no cedd value are much smaller.

0
source

TL DR: reverse logic: add a sparse index to the new has_cedd field, which is either zero or some constant (low selectivity index, not ideal, but improved through sparse) or, better yet, keep the global counter in another place, which is updated when each write operation.

Indexing featureData.cedd sounds like a bad idea if it can contain up to 5k characters, because it far exceeds the maximum index size and, apparently, you are not interested in the data itself, only if it is present.

Oh, and why is it slow? Probably quickly complete special requests. MongoDB can allocate all resources for this OLAP-like request, but this will delay any "regular OLTP-style requests."


There are two problems here:

  • $exists : false is evil, and I doubt that indexing will help: Indexes are made for data, and $exists is a meta query in the structure. It can use the index if it is in the field, and the request $exists : true , because if the indexed value exists, the field itself must also exist in this document. Reversing this logic is difficult: if the field does not exist, it is not in the index or has super-low selectivity. Crawl indexes are usually problematic, which is also true for queries using $ne way.

  • MongoDB will have to de-serialize 500k objects and test each one to fulfill $exists . You cannot compare this to MySQL, where you have a fixed table structure, in fact $exists : false does not have the SQL equivalent, because the field MUST exist, otherwise your table is badly broken.

+5
source

Source: https://habr.com/ru/post/987484/


All Articles