MongoDB counting aggregation of nested objects

I have a very nested set of mongoDB objects, and I want to count the number of subdocuments that match the given Change: condition (in each document) . For instance:

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"}, "studies":[ { "study_id":"Study1", "samples":[ { "sample_id":"NA00001", "formatdata":[ {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]} ] }, { "sample_id":"NA00002", "formatdata":[ {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]} ] } ] } ] } {"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"}, "studies":[ { "study_id":"Study3", "samples":[ { "sample_id":"SAMPLE1", "formatdata":[ {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]} ] }, { "sample_id":"SAMPLE2", "formatdata":[ {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]} ] } ] } ] } {"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"}, "studies":[ { "study_id":"Study3", "samples":[ { "sample_id":"SAMPLE3", "formatdata":[ {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]} ] }, { "sample_id":"SAMPLE7", "formatdata":[ {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]} ] } ] } ] } 

I want to know how many subdocuments GT contains: "1 | 0", which in this case will be 1 in the first document, and two in the second and 0 in the 3rd. I tried unwinding and aggregation functions, but I obviously am not doing anything right. When I try to count supporting documents by the β€œGT” field, mongo complains:

 db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}]) 

since my group names cannot contain ".", but if I leave them:

 db.collection.aggregate([{$group: {"$GT":1,_id:0}}]) 

he complains because "$ GT cannot be the name of an operator"

Any ideas?

+6
source share
1 answer

You need to handle $unwind when working with arrays, and you need to do this three times:

  db.collection.aggregate([ // Un-wind the array to access filtering { "$unwind": "$studies" }, { "$unwind": "$studies.samples" }, { "$unwind": "$studies.samples.formdata" }, // Group results to obtain the matched count per key { "$group": { "_id": "$studies.samples.formdata.GT", "count": { "$sum": 1 } }} ]) 

Ideally, you want to filter the input. Perhaps do this with $ match both before and after processing $ unwind and using $ regex to match documents where the data at the point begins with "1".

  db.collection.aggregate([ // Match first to exclude documents where this is not present in any array member { "$match": { "studies.samples.formdata.GT": /^1/ } }, // Un-wind the array to access filtering { "$unwind": "$studies" }, { "$unwind": "$studies.samples" }, { "$unwind": "$studies.samples.formdata" }, // Match to filter { "$match": { "studies.samples.formdata.GT": /^1/ } }, // Group results to obtain the matched count per key { "$group": { "_id": { "_id": "$_id", "key": "$studies.samples.formdata.GT" }, "count": { "$sum": 1 } }} ]) 

Note that in all cases, the prefix "dollar $" is "variables" related to the properties of the document. These are the β€œvalues” for using the input on the right. Left "keys" should be specified as a simple string key. No variable can be used to indicate a key.

+15
source

Source: https://habr.com/ru/post/980905/


All Articles