What happens in Kafka when redistributing partitions (esp. Logsizes)?

therefore, we are currently trying to use Kafka 0.9 and to a large extent support the concept phase. We have just begun to study it and are trying to find out if we need it. There is still something to be done there, so please carry me here.

The current setting is as follows:

  • 3 Kafka-brokers on different hosts: zkhost1, zkhost2, zkhost3
  • One topic: "myTopic"
  • The topic was 4 sections
  • The replication rate was 1
  • We have one manufacturer and three consumers, all in one consumer group "myGroup"

Now I tried to change the replication rate using the kafka-reassign-partitions.sh script. To do this, I created the following JSON file:

{"version":1, "partitions":[ {"topic":"myTopic","partition":0,"replicas":[0,1,2]}, {"topic":"myTopic","partition":1,"replicas":[0,1,2]}, {"topic":"myTopic","partition":2,"replicas":[0,1,2]}, {"topic":"myTopic","partition":3,"replicas":[0,1,2]} ] } 

... and then execute the script:

 kafka/bin/kafka-reassign-partitions.sh --zookeeper zkhost1:2181,zkhost2:2181,zkhost3:2181 --reassignment-json-file increase-replication-factor.json --execute 

It was smooth, and after that I got the expected replication:

 Topic:myTopic PartitionCount:4 ReplicationFactor:3 Configs: Topic: myTopic Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1 Topic: myTopic Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1 Topic: myTopic Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1 Topic: myTopic Partition: 3 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1 

I do not understand what happened to partitions during this reassignment. When I looked at the ConsumerOffsetChecker, this is what I saw before the reassignment:

 kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group myGroup --zookeeper zkhost1:2181 --topic myTopic Group Topic Pid Offset logSize Lag Owner myGroup myTopic 0 925230 925230 0 none myGroup myTopic 1 925230 925230 0 none myGroup myTopic 2 925230 925230 0 none myGroup myTopic 3 925230 925230 0 none 

... and this is what I saw after the reassignment:

 Group Topic Pid Offset logSize Lag Owner myGroup myTopic 0 23251 23252 1 none myGroup myTopic 1 41281 41281 0 none myGroup myTopic 2 23260 23260 0 none myGroup myTopic 3 41270 41270 0 none 

For me, this raised a few questions:

  • Why is logSize now greatly reduced? Is a reassignment called for some cleanup? (we did not set a byte limit)
  • Why were all 4 sections approximately the same size before reassignment, whereas after reassignment is this a big difference between sections 0.2 and 1.3? Do not all sections of the same topic have the same logSize, or do I not understand the concept here?
  • Can something like this (for example, reassigning partitions) lead to data loss? (In this case, I could not see on our consumer). And if so, is there a way to do this without this risk?

Thanks for your answers and best wishes,

/ tehK

+5
source share

Source: https://habr.com/ru/post/1246618/


All Articles