Calculate average field value in embedded documents / array

I want to calculate the rating_average field of this object with rating fields inside array ratings. Can you help me understand how to use aggregation with $ avg?

{ "title": "The Hobbit", "rating_average": "???", "ratings": [ { "title": "best book ever", "rating": 5 }, { "title": "good book", "rating": 3.5 } ] } 
+6
source share
3 answers

aggregation structure in MongoDB 3.4 and a newer version of $reduce , which efficiently calculates the total quantity without the need for additional pipelines. Think about how to use it as an expression to return the overall rating and get the number of ratings using $size . Together with $addFields average value, thus, can be calculated using the arithmetic operator $divide , as in the formula average = total ratings/number of ratings :

 db.collection.aggregate([ { "$addFields": { "rating_average": { "$divide": [ { // expression returns total "$reduce": { "input": "$ratings", "initialValue": 0, "in": { "$add": ["$$value", "$$this.rating"] } } }, { // expression returns ratings count "$cond": [ { "$ne": [ { "$size": "$ratings" }, 0 ] }, { "$size": "$ratings" }, 1 ] } ] } } } ]) 

Output result

 { "_id" : ObjectId("58ab48556da32ab5198623f4"), "title" : "The Hobbit", "ratings" : [ { "title" : "best book ever", "rating" : 5.0 }, { "title" : "good book", "rating" : 3.5 } ], "rating_average" : 4.25 } 

In older versions, you will need to first apply the $unwind operator on ratings first enter the array field as the initial stage of the pipeline assembly. This will deconstruct the ratings array field from the input documents to output the document for each item. Each output document replaces the array with an element value.

The second stage of the pipeline will be the $group operator, which groups the input documents using _id and title and applies the desired expression $avg to each group that calculates the average value. There is another drive operator $push that saves the original field of the ratings array, returning an array all values ​​that are the result of applying the expression to each document in the specified group.

The final step in the transition is the $project operator, which then reformatts each document in the stream, for example, by adding a new ratings_average field.

So, if, for example, you have a sample document in your collection (both on top and so on):

 db.collection.insert({ "title": "The Hobbit", "ratings": [ { "title": "best book ever", "rating": 5 }, { "title": "good book", "rating": 3.5 } ] }) 

To calculate the average value of the ratings array and project the value into another ratings_average field, you can use the following aggregation pipeline:

 db.collection.aggregate([ { "$unwind": "$ratings" }, { "$group": { "_id": { "_id": "$_id", "title": "$title" }, "ratings":{ "$push": "$ratings" }, "ratings_average": { "$avg": "$ratings.rating" } } }, { "$project": { "_id": 0, "title": "$_id.title", "ratings_average": 1, "ratings": 1 } } ]) 

Result

 /* 1 */ { "result" : [ { "ratings" : [ { "title" : "best book ever", "rating" : 5 }, { "title" : "good book", "rating" : 3.5 } ], "ratings_average" : 4.25, "title" : "The Hobbit" } ], "ok" : 1 } 
+9
source

It really could have been written much shorter, and that was even true at the time of writing. If you want "medium" just use $avg :

 db.collection.aggregate([ { "$addFields": { "rating_average": { "$avg": "$ratings.rating" } }} ]) 

The reason for this is that with MongoDB 3.2, the $avg operator got two β€œthings”:

  • The ability to process an "array" of arguments in the form of "expression", and not just as a battery for $group

  • Benefits from MongoDB 3.2 features that allow "shorthand" notation of array expressions. Be either part of:

     { "array": [ "$fielda", "$fieldb" ] } 

    or to denote a single property from an array as an array of values ​​of this property:

     { "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] } 

In earlier versions, you would need to use $map to access the "rating" property inside each element of the array. Now you are not doing it.


For the record, you can simplify the use of $reduce :

 db.collection.aggregate([ { "$addFields": { "rating_average": { "$reduce": { "input": "$ratings", "initialValue": 0, "in": { "$add": [ "$$value", { "$divide": [ "$$this.rating", { "$size": { "$ifNull": [ "$ratings", [] ] } } ]} ] } } } }} ]) 

Yes, as has been said, this really just repeats the existing $avg functionality, and therefore, since this operator is available, this is the one that should be used.

+3
source

Since you have the data to be calculated in the array, you need to expand it first. Do this using $unwind in the aggregation pipeline:

 {$unwind: "$ratings"} 

Then you can access each element of the array as an embedded document with the ratings key in the final aggregation documents. Then you just need $group for title and calculate $avg :

 {$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}} 

Then just restore the title field:

 {$project: {_id: 0, title: "$_id", ratings: 1, average: 1}} 

So, here is the results aggregation pipeline:

 db.yourCollection.aggregate([ {$unwind: "$ratings"}, {$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"} } }, {$project: {_id: 0, title: "$_id", ratings: 1, average: 1}} ]) 
+2
source

Source: https://habr.com/ru/post/987377/


All Articles