I am trying to use the mongodb aggregation query to combine ($ lookup) two collections, and then the different values ββof all unique values ββin a combined array. * Note. I do not necessarily know which fields (keys) are in the metaDataMap array. And I do not want to read or include fields that may or may not exist on the Map. So why the aggregation request looks like this.
So, my two collections look like this: events -
{ "_id" : "1", "name" : "event1", "objectsIds" : [ "1", "2", "3" ], }
The objects
{ "_id" : "1", "name" : "object1", "metaDataMap" : { "SOURCE" : ["ABC", "DEF"], "DESTINATION" : ["XYZ", "PDQ"], "TYPE" : [] } }, { "_id" : "2", "name" : "object2", "metaDataMap" : { "SOURCE" : ["RST", "LNE"], "TYPE" : ["text"] } }, { "_id" : "3", "name" : "object3", "metaDataMap" : { "SOURCE" : ["NOP"], "DESTINATION" : ["PHI", "NYC"], "TYPE" : ["video"] } }
My results
{ _id:"SOURCE", count:5 _id:"DESTINATION", count: 4 _id:"TYPE", count: 2 }
What I still know:
db.events.aggregate([ {$match: {"_id" : id}} ,{$lookup: {"from" : "objects", "localField" : "objectsIds", "foreignField" : "_id", "as" : "objectResults"}} ,{$unwind: "$objectResults"} //Line 1 ,{$project: {x: "$objectResults.metaDataMap"}} //Line 2 ,{$unwind: "$x"} ,{$project: {"_id":0}} ,{$project: {x: {$objectToArray: "$x"}}} ,{$unwind: "$x"} ,{$group: {_id: "$xk", tmp: {$push: "$xv"}}} ,{$addFields: {tmp: {$reduce:{ input: "$tmp", initialValue:[], in:{$concatArrays: [ "$$value", "$$this"]} }} }} ,{$unwind: "$tmp"} ,{$group: {_id: "$_id", uniqueVals: {$addToSet: "$tmp"}}} ,{$addFields: {count: {"$size":"$uniqueVals"}}} ,{$project: {_id: "$_id", count: "$count"}} ]);
My problem is that I am marked with lines 1 and 2. The above works, but takes about 50 seconds for 25,000 values ββin the fields of the metaDataMap array (objectsResults.metaDataMap). So, for example, having 25,000 values ββin the SOURCE array of the metaDataMap object of object 1. This way to slow down. Another quicker way to do this is to replace line 1 & 2 with:
,{$project: {x: "$objectResults.metaDataMap"}} //Line 1 ,{$unwind: "$x"} //Line 2
This is faster (less than 3 seconds), but can only be performed on datasets that have ~ 10,000 elements or less. All of the above, and I get the error "exceeds the maximum document size."
Please, help!