MongoDB is very slow when counting null values (or {$ exists: false})

Question

MongoDB is very slow when counting null values (or {$ exists: false})

I have a Mongo server running on a VPS with 16 GB of memory (although probably with slow IO using magnetic disks).

I have a collection of approximately 35 million records that does not fit into main memory ( db.stats() reports a size of 35GB and storageSize 14GB), however 1.7GB for totalIndexSize should be convenient there.

There is a specific bg field. I am asking for what may be present with a value of true or absent (please do not discuss whether this is the best representation of the data - I still think Mongo is behaving strangely). This field is indexed with an unsharp index with a message size of 146 MB.

I use the WiredTiger storage engine with the default cache size (so it should be around 8 GB).

I am trying to count the number of entries missing in the bg field.

The calculation of true values is possible with a high probability (a few seconds):

 > db.entities.find({bg: true}).count() 8300677

However, querying for missing values is extremely slow (about 5 minutes):

 > db.entities.find({bg: null}).count() 27497706

In my eyes explain() looks fine:

 > db.entities.find({bg: null}).explain() { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "testdb.entities", "indexFilterSet" : false, "parsedQuery" : { "bg" : { "$eq" : null } }, "winningPlan" : { "stage" : "FETCH", "filter" : { "bg" : { "$eq" : null } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "bg" : 1 }, "indexName" : "bg_1", "isMultiKey" : false, "direction" : "forward", "indexBounds" : { "bg" : [ "[null, null]" ] } } }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "mongo01", "port" : 27017, "version" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105" }, "ok" : 1 }

However, the request remains stubbornly slow even after repeated calls. Other counter requests for different values are fast:

 > db.entities.find({bg: "foo"}).count() 0 > db.entities.find({}).count() 35798383

I find this strange, as I understand that the missing fields in non-sparse indexes are simply stored as null , so a null count request should look like an actual value count (or maybe up to three times as many as positive values if it should read more index entries or something else). Indeed, this answer reports great speed improvements over similar queries involving null and .count() . The only point of differentiation that I can think of is WiredTiger.

Can someone explain why my request to consider null values so slow or what I can do to fix it (besides doing the obvious subtraction of the true counts from the total, which will work fine but not satisfy my curiosity)?

+6

performance mongodb aggregation-framework

Andy MacKinlay May 19, '15 at 7:55

source share

1 answer

msaspence · Accepted Answer · 2015-08-27T00:24:17+0000

This is the expected behavior, see https://jira.mongodb.org/browse/SERVER-18653 . It sounds like a weird call to me, but there you go, I'm sure there are programmers who know more about MongoDB than I, who are responsible.

You will need to use a different value to indicate null. I think it will depend on what you use the field for. In my case, this is an external link, so I'll just start using false to mean null. If you use it to store a boolean, you may need to use "null", -1, 0, etc.

MongoDB is very slow when counting null values ​​(or {$ exists: false})

More articles:

MongoDB is very slow when counting null values (or {$ exists: false})