Mongo $ group too slow

I have a mongo db collection of about 168.2 million documents. I am trying to get the average of a specific field using $ group, and I use $ match before $ group in the pipeline to use the index on client.city. But it takes about 5 minutes to complete the request, which is very slow.

Here is what I tried:

db.ar12.aggregate( {$match:{'client.city':'New York'}}, {'$group':{'_id':'client.city', 'avg':{'$avg':'$length'}}} ) db.ar12.aggregate( {$match:{'client.city':'New York'}}, {'$group':{'_id':null, 'avg':{'$avg':'$length'}}} ) db.ar12.aggregate( {$match:{'client.city':'New York'}}, {$project: {'length':1}}, {'$group':{'_id':null, 'avg':{'$avg':'$length'}}} ) 

All three queries take approximately the same time, the number of documents from client.city = to New York is 1,231,672, find({'client.city':'New York').count() takes the second place to run

 > db.version() 3.2.0 

EDIT

Here's the result of the explanation ... Regarding the comment for adding a composite index with a length, will this help, although I am not looking in length, I want all longitudes ...

 { "waitedMS" : NumberLong(0), "stages" : [ { "$cursor" : { "query" : { "client.city" : "New York" }, "fields" : { "length" : 1, "_id" : 1 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "clients.ar12", "indexFilterSet" : false, "parsedQuery" : { "client.city" : { "$eq" : "New York" } }, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "client.city" : 1 }, "indexName" : "client.city_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "client.city" : [ "[\"New York\", \"New York\"]" ] } } }, "rejectedPlans" : [ ] } } }, { "$project" : { "length" : true } }, { "$group" : { "_id" : { "$const" : null }, "total" : { "$avg" : "$length" } } } ], "ok" : 1 } 

EDIT 2

I added the client.city and length composite index, but to no avail the speed is still too slow, I tried these 2 queries:

 db.ar12.aggregate( {$match: {'client.city':'New York'}}, {$project: {'client.city':1, 'length':1}}, {'$group':{'_id':'$client.city', 'avg':{'$avg':'$length'}}} ) 

The query above did not use a composite index, so I tried this to force it, and so far nothing has changed:

 db.ar12.aggregate( {$match: { $and : [{'client.city':'New York'}, {'length':{'$gt':0}}]}}, {$project: {'client.city':1, 'length':1}}, {'$group':{'_id':'$client.city', 'avg':{'$avg':'$length'}}} ) 

The following is an explanation of the last request:

 { "waitedMS" : NumberLong(0), "stages" : [ { "$cursor" : { "query" : { "$and" : [ { "client.city" : "New York" }, { "length" : { "$gt" : 0 } } ] }, "fields" : { "client.city" : 1, "length" : 1, "_id" : 1 }, "queryPlanner" : { "plannerVersion" : 1, "namespace" : "clients.ar12", "indexFilterSet" : false, "parsedQuery" : { "$and" : [ { "client.city" : { "$eq" : "New York" } }, { "length" : { "$gt" : 0 } } ] }, "winningPlan" : { "stage" : "CACHED_PLAN", "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "client.city" : 1, "length" : 1 }, "indexName" : "client.city_1_length_1", "isMultiKey" : false, "isUnique" : false, "isSparse" : false, "isPartial" : false, "indexVersion" : 1, "direction" : "forward", "indexBounds" : { "client.city" : [ "[\"New York\", \"New York\"]" ], "length" : [ "(0.0, inf.0]" ] } } } }, "rejectedPlans" : [ ] } } }, { "$project" : { "client" : { "city" : true }, "length" : true } }, { "$group" : { "_id" : "$client.city", "avg" : { "$avg" : "$length" } } } ], "ok" : 1 } 
+5
source share
1 answer

I found work around, the length goes from 1 to 70. So I did in python, iterating from 1 to 70, and found the amount of each length for each city,

 db.ar12.find({'client.city':'New York', 'length':i}).count() 

which is very fast and then calculates the average in python, it takes about 2 seconds to run.

This is not the best solution, since I have other requests to run, I don’t know if I can find work for all of them ...

0
source

Source: https://habr.com/ru/post/1240637/


All Articles