The accepted answer is terribly slow in large collections and does not return _id duplicate entries.
Aggregation is much faster and can return _id s:
db.collection.aggregate([ { $group: { _id: { name: "$name" }, // replace `name` here twice uniqueIds: { $addToSet: "$_id" }, count: { $sum: 1 } } }, { $match: { count: { $gte: 2 } } }, { $sort : { count : -1} }, { $limit : 10 } ]);
At the first stage of the $ group aggregation pipeline, the operator aggregates the documents in the name field and stores in uniqueIds each value _id grouped records. The $ sum operator adds the values โโof the fields passed to it, in this case constant 1 - thereby counting the number of grouped entries in the count field.
In the second stage of the pipeline, we use $ match to filter documents with count at least 2, i.e. duplicates.
Then we first sort the most common duplicates and limit the results to the top 10.
This query will output up to $limit entries with duplicate names along with their _id s. For example:
{ "_id" : { "name" : "Toothpick" }, "uniqueIds" : [ "xzuzJd2qatfJCSvkN", "9bpewBsKbrGBQexv4", "fi3Gscg9M64BQdArv", ], "count" : 3 }, { "_id" : { "name" : "Broom" }, "uniqueIds" : [ "3vwny3YEj2qBsmmhA", "gJeWGcuX6Wk69oFYD" ], "count" : 2 }
expert Aug 12 '13 at 2:00 a.m. 2013-08-12 02:00
source share