Joining a card
In the connection on the side of the map (fragment-replication), you hold one data set in memory (for example, a hash table) and join another data set while recording. In Pig you write
edges_from_list = JOIN a_follows_b BY user_a_id, some_list BY user_id using 'replicated';
taking care that the smaller data set is on the right. This is extremely efficient since there is no network overhead or minimum CPU requirements.
Reduce connection
, mero.
<user_id {A, B, F, ..., Z}, { A, C, G, ..., Q} >
:
[A user_id A]
[A user_id C]
...
[A user_id Q]
...
[Z user_id Q]
, - . Pig , . ( -, ).
, . ( , ). , ; , , .
, , . , , , , () .
- . Zebra () .