The difference and benefits of JoinWithTiny, JoinWithHuge and joinHint

What is the difference between using joinHint and joinWithTiny, joinWithHuge?

As for joinHint, we can use BROADCAST_HASH_FIRST: Tell me that the first input for input is much smaller than the second. REPARTITION_HASH_FIRST: Tell me that the first input for the connection is slightly smaller than the second.

Meanwhile, we can also use joinWithHuge and joinWithTiny

They are the same? why does joinWithTiny use BROADCAST_HASH_FIRST?

The advantage of using these features is Flink's job to save time for checking the size of the data connection?

+4
source share
1 answer

Yes, it DataSet.joinWithTiny(DataSet other)is a shortcut for DataSet.join(DataSet other, JoinHint.BROADCAST_HASH_SECOND), but it DataSet.joinWithHuge(DataSet other)is a shortcut for DataSet.join(DataSet other, JoinHint.BROADCAST_HASH_FIRST).

Apache Flink . . ( ) , Flink. Flink , , , . . , , .

, , , Flink.

+6
source

Source: https://habr.com/ru/post/1598445/


All Articles