How does kafa balance partitions?

i ran into the issue of balancing the load in kafa. So, I created a theme with 10 sections and created 2 users. 10 sections were divided and assigned to these consumers (5 sections on the first and 5 on the second), and it works fine. Sometimes the first consumer works, sometimes the second.

But at some point we may encounter a situation where, for example, the second consumer receives a message, and it takes time to process this message (for example, 10 minutes).

So my question is, how will kafka decide which section stores the message?

Round robin in this case, I think, is not a good idea, because messages in sections processed by the second consumer will not be processed until the second consumer completes the long work.

Updated!

According to @Milan Baran's answer, the load is balanced on the manufacturer's side. But in this case, even if we provide a custom implementation of the Partitioner , there will be the same problem that the message that was stored in the section that was assigned to the consumer that performs the long-term work will not be processed until this consumer completes its long-term work.

Maybe in another place there is an additional balancer?

+6
source share
3 answers

Thank you all for your help. But I found the answer to my question. So, first of all, there are at least 3 places where Kafka balancing loads:

  • Algorithms are used to assign partitions to Round Robin or Range consumers. This can be configured by setting partition.assignment.strategy . The default range is.
  • At the manufacturer level, a strategy for selecting the section to store the message can be applied. It can be done partitioner.class
  • And the answer to my question. If one consumer processes the message for a long time, kafka believes that this consumer is dead and reassign partitions between other consumers. Therefore, when a task is performed by a consumer for a long time, sections are not assigned to him. when the consumer completes lengthy work sections, will be assigned again. And there will be no messages.
+2
source

The decision about which section to use does not correspond to kafka, but the developer sending the message must decide. Take a look at https://kafka.apache.org/documentation#producerconfigs

You can provide a separator class to decide which section to choose.

partitioner.class
Partitioner class that implements the Partitioner interface. org.apache.kafka.clients.producer.internals.DefaultPartitioner

There is a description of the DefaultPartitioner strategy

 /** * The default partitioning strategy: * <ul> * <li>If a partition is specified in the record, use it * <li>If no partition is specified but a key is present choose a partition based on a hash of the key * <li>If no partition or key is present choose a partition in a round-robin fashion */ 
+2
source

It seems that you need QUEUE . The ONE section is consumed by consumers MANY . Each consumer extracts the record from the section, processes it and selects another. If one consumer takes too long to process the record, others may still retrieve (different) records from the partition.

However, Kafka does NOT support this. Each section can be consumed by only one consumer in the consumer group.

In a word, you need something else to achieve the goal, for example RabbitMQ .

+2
source

Source: https://habr.com/ru/post/1012248/


All Articles