Hadoop Map-side join combine hash join?

Question

Hadoop Map-side join combine hash join?

I am trying to implement a Hash union in Hadoop.

However, Hadoop seems to have already joined the map and has already joined the connection from the smaller side.

What is the difference between these tricks and a hash join?

+3

join hash hadoop

ge0rgi0 May 12, '10 at 22:47

source share

2 answers

Hadoop , () . , - , -. " " " " MapReduce Jimmy Lin Chris Dyer, .

0

lgylym 12 . '14 11:50

mrflip · Accepted Answer · 2010-06-03T03:22:22+0000

Joining a card

In the connection on the side of the map (fragment-replication), you hold one data set in memory (for example, a hash table) and join another data set while recording. In Pig you write

edges_from_list = JOIN a_follows_b BY user_a_id, some_list BY user_id using 'replicated';

taking care that the smaller data set is on the right. This is extremely efficient since there is no network overhead or minimum CPU requirements.

Reduce connection

, mero.

<user_id   {A, B, F, ..., Z},  { A, C, G, ..., Q} >

:

[A   user_id    A]
[A   user_id    C]
...
[A   user_id    Q]
...
[Z   user_id    Q]

, - . Pig , . ( -, ).

, . ( , ). , ; , , .

, , . , , , , () .

- . Zebra () .

Hadoop Map-side join combine hash join?

More articles: