MapReduce is slow, but can handle very large datasets. On the other hand, the structure of Aggregation is a little faster, but will deal with large amounts of data.
The problem with the structure shown is that you need to unwind the arrays to crack the data. This means creating a new document for each element of the array and using the aggregation structure that must be performed in memory. Therefore, if you have 1000 documents with 100 array elements, he will need to create a stream of 100,000 documents in order for groupBy to count them.
You might want to consider whether there is a layout scheme that will better handle your requests, but if you want to do it using the Aggregation structure here, how could you do it (with some sample data so that the whole script falls into shell);
db.so.remove(); db.so.ensureIndex({ "items.sku": 1}, {unique:false}); db.so.insert([ { _id: 42, last_modified: ISODate("2012-03-09T20:55:36Z"), status: 'active', items: [ { sku: '00e8da9b', qty: 1, item_details: {} }, { sku: '0ab42f88', qty: 4, item_details: {} }, { sku: '0ab42f88', qty: 4, item_details: {} }, { sku: '0ab42f88', qty: 4, item_details: {} }, ] }, { _id: 43, last_modified: ISODate("2012-03-09T20:55:36Z"), status: 'active', items: [ { sku: '00e8da9b', qty: 1, item_details: {} }, { sku: '0ab42f88', qty: 4, item_details: {} }, ] }, ]); db.so.runCommand("aggregate", { pipeline: [ {
Note that I have $ group'd twice because you said that SKU can only count once per document, so we need to sort the unique doc / sku pairs first and then count them.
If you want the result to be slightly different (in other words, EXACTLY, as in your example), we can $ design them.