Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later

It is not clear why we need both session.timeout.ms and max.poll.interval.ms, and when will we use one or the other or both? Both seem to indicate that the upper bound of the time coordinator will wait to receive a pulse from the consumer before considering him dead.

Also, how does it behave for versions 0.10.1.0+ based on KIP-62 ?

+5
source share
1 answer

Prior to KIP-62, only session.timeout.ms exists (i.e. Kafka 0.10.0 and earlier). max.poll.interval.ms is entered via KIP-62 (part of Kafka 0.10.1 ).

KIP-62, separates the heartbeat from calls to poll() through the background heart rate stream. This allows you to increase the processing time (i.e., the time between two consecutive poll() ) than the heartbeat interval.

Suppose message processing takes 1 minute. If the pulse and polling are connected (i.e. Before KIP-62), you need to set session.timeout.ms more than 1 minute to prevent the consumer from exiting. However, if a consumer dies, it also takes more than 1 minute to detect a failed consumer.

The KIP-62 separates the survey and the heartbeat, allowing you to send a heartbeat between two consecutive surveys. You now have two streams: a heartbeat stream and a processing stream, and thus the KIP-62 enters a timeout for each. session.timeout.ms is for the heartbeat thread, and max.poll.interval.ms is for the processing thread.

Suppose you set session.timeout.ms=30000 , so the interrupt flow for the consumer must send a pulse to the broker before this time expires. On the other hand, if it takes 1 minute to process one message, you can set max.poll.interval.ms more than one minute to give the thread process more time to process the message.

If the workflow dies, max.poll.interval.ms is required to detect this. However, if the entire consumer dies (and the dying processing flow is likely to fail for the entire consumer, including the heart rate flow), it only takes session.timeout.ms to detect it.

The idea is to provide quick detection of a faulty consumer, even if the processing itself takes a lot of time.

+22
source

Source: https://habr.com/ru/post/1257336/


All Articles