What is the difference between a topic and a topic replica in a kafka cluster

What is the difference between a topic and a topic replica in a kafka cluster. I mean that they keep copies of messages in the subject. Then what is the real convergence?

+14
source share
5 answers

When you add a message to this topic, you call the send (KeyedMessage message) method of the manufacturer's API. This means that your message contains a key and a value. When you create a topic, you indicate the number of sections that you want to have. When you call the Send method for this section, data will be sent to only one separate section based on the hash value of your key (by default). Each partition can have a replica, which means that both partitions and its replicas store the same data. The limitation is that both your producer and consumer work only with the main replica and its copies, and are used only for backup.

Refer to the documentation: http://kafka.apache.org/documentation.html#producerapi And basic training: http://www.slideshare.net/miguno/apache-kafka-08-basic-training-verisign

+9
source

Themes are shared between multiple nodes, so a theme can go beyond the boundaries of the node. Partitions are replicated for fault tolerance. Leader replication and takeover is one of the biggest differences between Kafka and other brokers / Flume. From the Apache Kafka website:

Each section has one server that acts as a "leader", and zero or more servers that act as "followers." The leader processes all read and write requests for the section, while followers passively copy the leader. If a leader fails, one of the followers automatically becomes the new leader. Each server acts as a leader for some of its partitions and a follower for others, so the load in the cluster is well balanced.

+7
source

Two other important features of Kafka are parallelism and redundancy. Kafka does this by providing each topic with a certain number of sections and replicas.

Partitions

Sections: one piece of Kafka's theme. The number of sections is customizable for each topic. More sections allow more parallelism when reading from topics. The number of sections determines how many consumers you have in the consumer group. For example, if a topic has 3 sections, you can have 3 consumers in the balance of consumers who consume between sections. Thus, you have parallelism of 3. This section number is somewhat difficult to determine until you know how fast you produce data and how fast you consume data. If you have a topic that you know will be large, you will need to have more sections.

Replicas

Replicas: These are copies of partitions. They are never written or read. Their sole purpose is data redundancy. If your topic has n replicas, n-1 brokers may fail before data loss occurs. In addition, you cannot have a topic such as the replication rate, which is greater than the number of brokers that you have. For example, you have 5 Kafka brokers, you may have a topic with a maximum replication ratio of 5, and 5-1 = 4 brokers can go down to data loss.

+4
source

Kafka themes are divided into several sections. Any entry written on a specific topic falls into a specific section. Each record is assigned and identified by a unique offset. Replication is done at the partition level. The backup unit of the topic section is called a replica. The logic that solves the section for the message is custom. The section helps in reading / writing data in parallel, dividing them into several sections distributed among several brokers. In each replica, one server acts as a leader, and others as followers. The leader processes the read / write while the followers copy the data. In case of failure of the leader, any of the followers is elected leader.

Hope this explains!

Further reading

0
source
  • section: each topic can be divided into sections for load balancing (you can write in different sections at the same time) and scalability (the theme can be scaled without instance restrictions); in the same section, the entries are ordered;

  • replica: mainly for fault tolerance;

Quotes :

Log sections are distributed across the servers in the Kafka cluster, with each server processing data and requests for partition separation. Each partition is replicated to a configurable number of servers for fault tolerance.

There is a fairly intuitive guide to explaining some fundamental concepts in Kafka: https://www.tutorialspoint.com/apache_kafka/apache_kafka_fundamentals.htm

In addition, there is a workflow that will help you overcome the confusion: https://www.tutorialspoint.com/apache_kafka/apache_kafka_workflow.htm

0
source

Source: https://habr.com/ru/post/978731/


All Articles