MongoDB, return the last document for each user_id in the collection

Look for similar functionality for Postgres' Distinct On.

You have a collection of documents {user_id, current_status, date}, where status is text only and date is date. Still in the early stages of head wrapping around the mongo and getting the best way to do something.

Would mapreduce be the best solution here, does the card emit everything, and does the abbreviation record last, or is there a built-in solution without pulling out mr?

+4
source share
3 answers

There is a distinct command, however I'm not sure what you need. Distinct is a kind of query command, and with a large number of users, you probably want to collapse data in real time.

Map-Reduce is probably one way to go here.

Card Phase: Your key will be just an identifier. Your value will look like {current_status:'blah',date:1234} .

Reduce phase: Given an array of values, you would capture the most recent and return only it.

To make this work optimal, you probably want to take a look at the new feature from 1.8.0. Reduce / Reduce function. Allows you to process only new data instead of reprocessing the entire collection of statuses.

Another way to do this is to create the “most recent” collection and link the status insert to this collection. Therefore, when you insert a new status for the user, you update his "most recent".

Depending on the importance of this feature, you could do both things.

0
source

Current solution that works well.

 map = function () {emit(this.user.id, this.created_at);} //We call new date just in case somethings not being stored as a date and instead just a string, cause my date gathering/inserting function is kind of stupid atm reduce = function(key, values) { return new Date(Math.max.apply(Math, values.map(function(x){return new Date(x)})))} res = db.statuses.mapReduce(map,reduce); 
0
source

Another way to achieve the same result is to use the group command, which is a kind of mr-shortcut that allows you to collect data for a specific key or set of keys. In your case, it will look like this:

 db.coll.group({ key : { user_id: true }, reduce : function(obj, prev) { if (new Date(obj.date) < prev.date) { prev.status = obj.status; prev.date = obj.date; } }, initial : { status : "" } }) 

However, if you do not have a fairly small fixed number of users, I am sure that the best solution would be, as previously suggested, to keep a separate collection containing only the most recent status message for each user.

0
source

Source: https://habr.com/ru/post/1341710/


All Articles