I found this solution that works with MongoDB 3.4: I assume the duplicate field is called fieldX
db.collection.aggregate([ { // only match documents that have this field // you can omit this stage if you don't have missing fieldX $match: {"fieldX": {$nin:[null]}} }, { $group: { "_id": "$fieldX", "doc" : {"$first": "$$ROOT"}} }, { $replaceRoot: { "newRoot": "$doc"} } ], {allowDiskUse:true})
As a newbie to mongoDB, I spent a lot of time and used other long solutions to find and remove duplicates. However, I think this solution is neat and easy to understand.
It works by first matching documents that contain fieldX (I had several documents without this field, and I got one additional empty result).
The next step groups the documents by fieldX and inserts only the $ first document into each group using $$ ROOT . Finally, it replaces the entire aggregated group with the document found using $ first and $$ ROOT.
I had to add allowDiskUse because my collection is large.
You can add this after any number of pipelines, and although the documentation for $ first mentions the sorting step before using $ first , it worked for me without it. "I can’t post the link here, my reputation is less than 10 :("
You can save the results in a new collection by adding the $ out stage ...
Alternatively , if someone is interested in only a few fields, for example, field1, field2, and not the whole document, at the group stage without replaceRoot:
db.collection.aggregate([ { // only match documents that have this field $match: {"fieldX": {$nin:[null]}} }, { $group: { "_id": "$fieldX", "field1": {"$first": "$$ROOT.field1"}, "field2": { "$first": "$field2" }} } ], {allowDiskUse:true})