Pig: The outer union of more than two relationships

I want to do an outer join that includes 3 tables. I tried with this:

features = JOIN group_event by group left outer, group_session by group, group_order by group; 

I want all the group_event lines to be present in the output file, even if one or none of the other two relationships matches this.

The above command does not work. Obviously, since it should not work (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#JOIN+%28outer%29)

 Outer joins will only work for two-way joins; to perform a multi-way outer join, you will need to perform multiple two-way outer join statements. 

Split works and can be performed as follows:

 features1 = JOIN group_event by group left outer, group_session by group; features2 = JOIN features1 by group_event::group left outer, group_order by group; 

Any ideas to do this in one team? (It would be helpful if I joined even more tables)

+4
source share
1 answer

I think that at some point we need to trust the documentation, do not try to execute one command with several external connections.

Why? How should the next line work?

 JOIN a BY a1 LEFT OUTER, b BY b1, c BY c1 

Is LEFT OUTER for both tables or only for the first? If the first, then should the LEFT OUTER between b and c delete all entries that are not mapped to b ? Or in a ? The more you look for him, the less he feels, right?

What you want to do is JOIN ratio of a with b to ab , and then ab with c . If you think about it, it isn’t natural to do it within the same team due to the intermediate state of ab .

+1
source

Source: https://habr.com/ru/post/1442883/


All Articles