MongoDB Ordering by Medium Combined Numbers or Nested Helper Arrays

Question

MongoDB Ordering by Medium Combined Numbers or Nested Helper Arrays

Having some problems developing the best way to do this in MongoDB, perhaps this is a dataset of relationships, so I will probably be called. Nevertheless, his task is to check if this is possible.

Currently, I have to order the daily average world militia of logistics managers for all the vans in my department, as well as in a separate list with a combined weekly average.

Mr The first setup in the database was as follows:

{ "_id" : ObjectId("555cf04fa3ed8cc2347b23d7"), "name" : "My Manager 1", "vans" : [ { "name" : "van1", "miles" : NumberLong(56) }, { "name" : "van2", "miles" : NumberLong(34) } ] }

But I don’t see how to sort by a nested array without knowing the keys of the parent array (they will be standard 0-x)

So my next choice was to give up this idea, just enter the name in the first collection and the vans in the second collection with the manager’s Id.

So, removing the vans from the above example and adding this collection (vans)

 { "_id" : ObjectId("555cf04fa3ed8cc2347b23d9"), "name" : "van1", "miles" : NumberLong(56), "manager_id" : "555cf04fa3ed8cc2347b23d7" }

But since I need to show the results by the manager, how do I order (if possible) the average miles in this collection, where id = x, and then display the manager by its identifier.

thanks for the help

+6

arrays php mongodb

deejuk May 21 '15 at 7:37

source share

2 answers

First, do you need average miles for one day, average miles for a certain period of time, or average miles for a manager’s life? I would think about adding a timestamp field. Yes, _id has a timestamp, but this only reflects the time the document was created, not necessarily the time of the initial daily log.

Considerations for the first data model:

Does each document mean one day or one manager?
How many vans do you expect to have in the array? Does this list grow over time? Need to consider a document size of up to 16 MB in a year or two?

Considerations for the second data model:

Can you save the manager name as the "manager_id" field? Could this be used as a possible unique identifier for a secondary meta search? This will limit the need for secondary manager metadata searches to get their name.

As pointed out by @ n9code , the aggregation structure is the answer in both cases.

For the first data model, assuming each document represents one day, and you want to get the average value for a given day or range of days:

 db.collection.aggregate([ { $match: { name: 'My Manager 1', timestamp: { $gte: ISODate(...), $lt: ISODate(...) } } }, { $unwind: '$vans' }, { $group: { _id: { _id: '$_id', name: '$name', timestamp: '$timestamp' }, avg_mileage: { $avg: '$miles' } } }, { $sort: { avg_mileage: -1 } }, { $project: { _id: '$_id._id', name: '$_id.name', timestamp: '$_id.timestamp', avg_mileage: 1 } } ]);

If for the first data model each document is a manager, and the array of “vans” is growing daily, this particular data model is not ideal for two reasons:

the vans array can exceed the maximum size of the document ... after all, although it will be a lot of data.
It is much more difficult and more intensive to limit a certain date range because the timestamp at this moment will be nested in the "vans" element, and not in the document root

For completeness, here is the query:

 /* Assuming data model is: { _id: ..., name: ..., vans: [ { name: ..., miles: ..., timestamp: ... } ] } */ db.collection.aggregate([ { $match: { name: 'My Manager 1' } }, { $unwind: '$vans' }, { $match: { 'vans.timestamp': { $gte: ISODate(...), $lt: ISODate(...) } } }, { $group: { _id: { _id: '$_id', name: '$name' }, avg_mileage: { $avg: '$miles' } } }, { $sort: { avg_mileage: -1 } }, { $project: { _id: '$_id._id', name: '$_id.name', avg_mileage: 1 } } ]);

For the second data model, aggregation is simpler. I assume the inclusion of a timestamp:

 db.collection.aggregate([ { $match: { manager_id: ObjectId('555cf04fa3ed8cc2347b23d7') timestamp: { $gte: ISODate(...), $lt: ISODate(...) } } }, { $group: { _id: '$manager_id' }, avg_mileage: { $avg: '$miles' } names: { $addToSet: '$name' } } }, { $sort: { avg_mileage: -1 } }, { $project: { manager_id: '$_id', avg_mileage: 1 names: 1 } } ]);

I added an array of names (vehicles?) Used during average computing.

Relevant documentation:

$ match, $ unwind, $ group, $ sort, $ project - Pipeline aggregation steps
$ avg, $ addToSet - Group Battery Operators
Date types
ObjectId.getTimestamp

+1

zamnuts May 21 '15 at 10:34

source share

bagrat · Accepted Answer · 2015-05-21T08:47:38+0000

If Manager will have a limited number of Van s, then your first approach would be better, since you do not need to make two separate calls / queries in the database to collect your information.

Then the question arises, how to calculate the average level of danger on Manager , where the Aggregation Framework will help you a lot. Here is a query that will provide you with the data you need:

 db.manager.aggregate([ {$unwind: "$vans"}, {$group: {_id: { _id: "$_id", name: "$name" }, avg_milage: {$avg: "$vans.miles"} } }, {$sort: {"avg_milage": -1}}, {$project: {_id: "$_id._id", name: "$_id.name", avg_milage: "$avg_milage" } } ])

The first step, $unwind simply decompresses the vans array and creates separate documents for each element of the array.

Then the $group stage receives all documents with the same pair (_id, name) , and in the avg_milage field calculates the average value of the miles field from these documents.

The $sort is obvious, it just sorts the documents in descending order, using the new avg_milage field as the sort key.

Finally, the last step of $project simply cleans documents, creating appropriate forecasts, just for beauty :)

A similar thing is needed for your second desired result:

 db.manager.aggregate([ {$unwind: "$vans"}, {$group: {_id: { _id: "$_id", name: "$name" }, total_milage: {$sum: "$vans.miles"} } }, {$sort: {"total_milage": -1}}, {$project: {_id: "$_id._id", name: "$_id.name", weekly_milage: { $multiply: [ "$total_milage", 7 ] } } } ])

This will display a list of Managers with their weekly increase, sorted in descending order. Thus, you can $limit get the result and get the Manager with the highest age, for example.

And in the same way, you can get information about your vans:

 db.manager.aggregate([ {$unwind: "$vans"}, {$group: {_id: "$vans.name", total_milage: {$sum: "$vans.miles"} } }, {$sort: {"total_milage": -1}}, {$project: {van_name: "$_id", weekly_milage: { $multiply: [ "$total_milage", 7 ] } } } ])

MongoDB Ordering by Medium Combined Numbers or Nested Helper Arrays

More articles: