The difference and benefits of JoinWithTiny, JoinWithHuge and joinHint

Question

The difference and benefits of JoinWithTiny, JoinWithHuge and joinHint

What is the difference between using joinHint and joinWithTiny, joinWithHuge?

As for joinHint, we can use BROADCAST_HASH_FIRST: Tell me that the first input for input is much smaller than the second. REPARTITION_HASH_FIRST: Tell me that the first input for the connection is slightly smaller than the second.

Meanwhile, we can also use joinWithHuge and joinWithTiny

They are the same? why does joinWithTiny use BROADCAST_HASH_FIRST?

The advantage of using these features is Flink's job to save time for checking the size of the data connection?

+4

apache-flink

Akira Sendoh Jul 17 '15 at 21:17

source share

1 answer

Fabian Hueske · Accepted Answer · 2015-07-18T07:53:01+0000

Yes, it DataSet.joinWithTiny(DataSet other)is a shortcut for DataSet.join(DataSet other, JoinHint.BROADCAST_HASH_SECOND), but it DataSet.joinWithHuge(DataSet other)is a shortcut for DataSet.join(DataSet other, JoinHint.BROADCAST_HASH_FIRST).

Apache Flink . . ( ) , Flink. Flink , , , . . , , .

, , , Flink.

The difference and benefits of JoinWithTiny, JoinWithHuge and joinHint

More articles: