Kafka consumer restart (python) consumes all messages in the queue again

I am using Kafka 0.8.1 and Kafka python-0.9.0. In my setup, I have two settings for kafka brokers. When I launch my kafka consumer, I see how he extracts messages from the queue and tracks the offsets for both brokers. Everything works great!

My problem is that when I restart the consumer, it starts to consume messages from the very beginning. I expected that after the reboot, the consumer would start consuming messages from where he left off until his death.

I tried to track message offsets in Redis, and then called user.seek before reading the message from the queue to make sure I get messages that I have not seen before. Although this worked before deploying this solution, I wanted to check with y'all ... maybe I misunderstood something about Kafka or the python-Kafka client. It seems that a consumer who is able to restart reading from where he left off is pretty simple functionality.

Thank!

+4
source share
3 answers

Take care of the kafka-python library. He has a few minor issues.

, . .

SimpleConsumer seek (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer/simple.py#L174-L185), , .

:

  • consumer.seek(0, 0), .
  • consumer.seek(0, 1), .
  • consumer.seek(0, 2), .

- . , consumer.seek(5, 0), 5 .

, , . , .

+4

kafka-python kafka, zookeeper. , kafka apis / apache kafka 0.8.1.1. kafka, . kafka-python 0.9.4.

[ kafka-python]

+2

Kafka Zookeeper. Java API - , , .

, Python (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer.py), SimpleConsumer MultiProcessConsumer Zookeeper, , .

, (, ?) :

auto_commit: default True. Whether or not to auto commit the offsets
auto_commit_every_n: default 100. How many messages to consume
                     before a commit
auto_commit_every_t: default 5000. How much time (in milliseconds) to
                     wait before commit

, < 100 < 5000 ?

0

Source: https://habr.com/ru/post/1547811/


All Articles