MongoDB is very slow when counting null values ​​(or {$ exists: false})

I have a Mongo server running on a VPS with 16 GB of memory (although probably with slow IO using magnetic disks).

I have a collection of approximately 35 million records that does not fit into main memory ( db.stats() reports a size of 35GB and storageSize 14GB), however 1.7GB for totalIndexSize should be convenient there.

There is a specific bg field. I am asking for what may be present with a value of true or absent (please do not discuss whether this is the best representation of the data - I still think Mongo is behaving strangely). This field is indexed with an unsharp index with a message size of 146 MB.

I use the WiredTiger storage engine with the default cache size (so it should be around 8 GB).

I am trying to count the number of entries missing in the bg field.

The calculation of true values ​​is possible with a high probability (a few seconds):

 > db.entities.find({bg: true}).count() 8300677 

However, querying for missing values ​​is extremely slow (about 5 minutes):

 > db.entities.find({bg: null}).count() 27497706 

In my eyes explain() looks fine:

 > db.entities.find({bg: null}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "testdb.entities", "indexFilterSet" : false, "parsedQuery" : { "bg" : { "$eq" : null } }, "winningPlan" : { "stage" : "FETCH", "filter" : { "bg" : { "$eq" : null } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "bg" : 1 }, "indexName" : "bg_1", "isMultiKey" : false, "direction" : "forward", "indexBounds" : { "bg" : [ "[null, null]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "mongo01", "port" : 27017, "version" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105" }, "ok" : 1 } 

However, the request remains stubbornly slow even after repeated calls. Other counter requests for different values ​​are fast:

 > db.entities.find({bg: "foo"}).count() 0 > db.entities.find({}).count() 35798383 

I find this strange, as I understand that the missing fields in non-sparse indexes are simply stored as null , so a null count request should look like an actual value count (or maybe up to three times as many as positive values ​​if it should read more index entries or something else). Indeed, this answer reports great speed improvements over similar queries involving null and .count() . The only point of differentiation that I can think of is WiredTiger.

Can someone explain why my request to consider null values ​​so slow or what I can do to fix it (besides doing the obvious subtraction of the true counts from the total, which will work fine but not satisfy my curiosity)?

+6
source share
1 answer

This is the expected behavior, see https://jira.mongodb.org/browse/SERVER-18653 . It sounds like a weird call to me, but there you go, I'm sure there are programmers who know more about MongoDB than I, who are responsible.

You will need to use a different value to indicate null. I think it will depend on what you use the field for. In my case, this is an external link, so I'll just start using false to mean null. If you use it to store a boolean, you may need to use "null", -1, 0, etc.

+5
source

Source: https://habr.com/ru/post/987482/


All Articles