Join MongoDB MapReduce

I used MapReduce before to do the classic MR job, equivalent to GROUP BY in SQL.

I was wondering if it is possible to conceptually perform a JOIN operation with MapReduce. Any ideas how this can be implemented? Does it make sense to use MapReduce for this kind of operation?

Thanks!

+6
source share
2 answers

MongoDB does not support relational operations; it likes joins. Instead, you can denormalize your data by inserting lines that you must enter into an external document. Therefore, instead of combining products in Sales, you can have a products collection with this schema:

Products

 { _id: 123, name: "Widget", price: 9.99 sales: [ { id:1, date: "20100316", howMany: 2 }, { id:2, date: "20100316", howMany: 5 } ] } 

Then, whenever you retrieve a product, you also get its sales data, so there is no need to join or look for information elsewhere.

Alternatively, you can split into two collections, as you could with a relational database, and then use an additional query to get product sales, something like this:

SQL: SELECT Sales WHERE ProductId = 123

MongoDB: db.sales.find( { productid: 123 } )

Products

 { _id: 123, name: "Widget", price: 9.99 } 

Sale

 { id: 1, productid: 123, date: "20100316", howMany: 2 } { id: 2, productid: 123, date: "20100316", howMany: 5 } 
+4
source

My approach is below:

A look at hadoop I have a Brazil CompositeInputFormat approach, it takes two or more collections as input for a map reduction job

according to my investigation mongodb does not provide this yet. mongodb mapReduce runs on one colletion at a time. (please correct if i worng)

so I decided to put the collections that need to be combined in one collection, in which I will execute mapreduce for "sql right join"

this is from my magazine reporter project. in order to make the right connection in the case of "without a beat", it is enough to reduce the first phase map. The second phase reduction card is intended to eliminate the unnecessary right connection caused by the clock field.

 db.test.drop(); db.test.insert({"username" : 1, "day" : 1, "clock" : 0 }); db.test.insert({"username" : 1, "day" : 1, "clock" : 1 }); db.test.insert({"username" : 1, startDay : 1,endDay:2, "table" : "user" }); //startDay : 1,endDay:2 are used to define the employers working day (join to company - left the company) //you can use an array instedad of array here. for example day:[1,2,3, ...] m1 = function(){ if( typeof this.table!= "undefined" && this.table!=null){ username = this.username; startDay = this.startDay; endDay = this.endDay; while(startDay<=endDay){ emit({username:username,day:startDay},{clocks:["join"]}); // emit({username:username,day:startDay},1); startDay++; } }else{ emit({username:this.username,day:this.day},{clocks:[this.clock]}); } } r1 = function(key,values){ result = {clocks:[]} values.forEach(function(x){ result.clocks = x.clocks.concat(result.clocks); result.clocks=result.clocks.filter(function(element, index, array){ return element!="join"; }) }) return result; } db.test.mapReduce(m1,r1,{out:"result1"}) db.test.find(); db.result1.find(); m2=function(){ key=this._id; this.value.clocks.forEach(function(x){ key.clock=x; emit(key,1); }) } r2 = function(key,values){ value=0; values.forEach(function(x){ value+=1; }) return result; } db.result1.mapReduce(m2,r2,{out:"result2"}) db.test.find(); db.result2.find(); 
+3
source

Source: https://habr.com/ru/post/888409/


All Articles