First, do you need average miles for one day, average miles for a certain period of time, or average miles for a manager’s life? I would think about adding a timestamp field. Yes, _id has a timestamp, but this only reflects the time the document was created, not necessarily the time of the initial daily log.
Considerations for the first data model:
- Does each document mean one day or one manager?
- How many vans do you expect to have in the array? Does this list grow over time? Need to consider a document size of up to 16 MB in a year or two?
Considerations for the second data model:
- Can you save the manager name as the "manager_id" field? Could this be used as a possible unique identifier for a secondary meta search? This will limit the need for secondary manager metadata searches to get their name.
As pointed out by @ n9code , the aggregation structure is the answer in both cases.
For the first data model, assuming each document represents one day, and you want to get the average value for a given day or range of days:
db.collection.aggregate([ { $match: { name: 'My Manager 1', timestamp: { $gte: ISODate(...), $lt: ISODate(...) } } }, { $unwind: '$vans' }, { $group: { _id: { _id: '$_id', name: '$name', timestamp: '$timestamp' }, avg_mileage: { $avg: '$miles' } } }, { $sort: { avg_mileage: -1 } }, { $project: { _id: '$_id._id', name: '$_id.name', timestamp: '$_id.timestamp', avg_mileage: 1 } } ]);
If for the first data model each document is a manager, and the array of “vans” is growing daily, this particular data model is not ideal for two reasons:
- the vans array can exceed the maximum size of the document ... after all, although it will be a lot of data.
- It is much more difficult and more intensive to limit a certain date range because the timestamp at this moment will be nested in the "vans" element, and not in the document root
For completeness, here is the query:
db.collection.aggregate([ { $match: { name: 'My Manager 1' } }, { $unwind: '$vans' }, { $match: { 'vans.timestamp': { $gte: ISODate(...), $lt: ISODate(...) } } }, { $group: { _id: { _id: '$_id', name: '$name' }, avg_mileage: { $avg: '$miles' } } }, { $sort: { avg_mileage: -1 } }, { $project: { _id: '$_id._id', name: '$_id.name', avg_mileage: 1 } } ]);
For the second data model, aggregation is simpler. I assume the inclusion of a timestamp:
db.collection.aggregate([ { $match: { manager_id: ObjectId('555cf04fa3ed8cc2347b23d7') timestamp: { $gte: ISODate(...), $lt: ISODate(...) } } }, { $group: { _id: '$manager_id' }, avg_mileage: { $avg: '$miles' } names: { $addToSet: '$name' } } }, { $sort: { avg_mileage: -1 } }, { $project: { manager_id: '$_id', avg_mileage: 1 names: 1 } } ]);
I added an array of names (vehicles?) Used during average computing.
Relevant documentation: