In MongoDB, how to use the aggregation structure to have small documents

Question

In MongoDB, how to use the aggregation structure to have small documents

I am trying to play with an aggregation base, but I have a problem. I need to know how many people in my database have purchased something in the last month.

For this, I use this:

db.account.aggregate([ {$project : {civility : 1, 'purchase.date' : 1 }}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/02/2013'), $lt: new Date('02/03/2013')} }}, {$unwind: '$purchase'}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/02/2013'), $lt: new Date('02/03/2013')} }}, {$group: {_id: '$_id', total_buy : {$sum : 1}}}, {$match: {total_buy: {$gte: 2}}}, {$group: {_id: null, total_buyer : {$sum : 1}}} ])

I have this answer

 { "result" : [ { "_id" : null, "total_buyer" : 4443 } ], "ok" : 1 }

this query works because the date range of the date I'm using is small, but if I use the same query with a date range that is larger than this:

 db.account.aggregate([ {$project : {civility : 1, 'purchase.date' : 1 }}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/01/2013'), $lt: new Date('03/01/2013')} }}, {$unwind: '$purchase'}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/01/2013'), $lt: new Date('03/01/2013')} }}, {$group: {_id: '$_id', total_buy : {$sum : 1}}}, {$match: {total_buy: {$gte: 2}}}, {$group: {_id: null, total_buyer : {$sum : 1}}} ])

I have it:

 { "errmsg" : "exception: sharded pipeline failed on shard shard0000: { errmsg: \"exception: aggregation result exceeds maximum document size (16MB)\", code: 16389, ok: 0.0 }", "code" : 16390, "ok" : 0 }

is there something i'm doing wrong or can i not do what i need to do?

early

+4

mongodb

Toullettes Mar 14 '13 at 10:52

source share

1 answer

Asya kamsky · Answer 1 · 2013-09-01T20:39:10+0000

It looks like you can do a few things to improve aggregation:

1) add $project to avoid going through fields that you have already used (all except _id

2) you say you want the number of buyers who bought something, but you filter to keep buyers who bought two or more "times" or "things" for a period of time.

Result:

 db.account.aggregate([ {$project : {civility : 1, 'purchase.date' : 1 }}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/01/2013'), $lt: new Date('03/01/2013')} }}, {$unwind: '$purchase'}, {$match: {civility : 1 ,'purchase.date': {$gte: new Date('02/01/2013'), $lt: new Date('03/01/2013')} }}, {$project: {_id :1}}, {$group: {_id: '$_id', total_buys : {$sum : 1}}}, {$group: {_id: null, total_buyers : {$sum : 1}}} ])

Given the size of your _id field, it should work in the current version (2.4) if each splinter corresponds to no more than 420,000 documents. With each document representing the purchase, I suspect that you may still run into the limit, so you have several options:

1) wait until 2.6 (currently available as an unstable version of development 2.5.2), which removes the restriction on the size of the data set (here it is not the final size, but the size that shard0000 should return back to mongo, which is a problem), 2) use a different method to count individual buyers over a period of time (if this is what you really want, this is not exactly what your initial aggregation calculates).

In MongoDB, how to use the aggregation structure to have small documents

More articles: