What is the best way to push kafa messages from my edge nodes?

I have a worker in the main region (US-East), which calculates traffic data in our extreme locations. I want to output data from the regional region to our main area of ​​kafka.

For example, Poland, Australia, USA-West. I want to push all this data to US-East. I do not want to delay the additional delay during recording from the edge regions to the primary.

Another option is to create another kafka cluster and a working cluster that acts like a relay. This will require us to support individual clusters in each region and add much greater complexity to our deployments.

I saw Mirror Maker, but I really do not want Mirror to do anything, I think I'm more looking for a relay system. If this is not a developed way to do this, how can I aggregate all our application metrics into a primary region to be calculated and sorted?

Thank you for your time.

+5
source share
2 answers

As far as I know, here are your options:

  • Set up a local Kafka cluster in each region and the presence of your edge nodes, write to your local Kafka cluster for low-latency recording. From there, you would install a mirror producer that retrieves data from your local Kafka to your remote Kafka for aggregation.
  • If you are concerned about interrupting the application request path with high-latency requests, you can configure your manufacturers to write asynchronously (non-blocking) to your remote Kafka cluster. Depending on your choice of programming language, this can be a simple or complex exercise.
  • Start the relay service (or data buffer) for each host, which can be as simple as the log file and daemon that pushes your remote Kafka cluster (as mentioned above). In addition, run one instance of the Kafka / Zookeeper container (there are docker files that are combined together) that buffer data for later pulling.

Option 1. Definitely is the most standard solution to this problem, although a bit hard. I suspect that in the future there will be more tools that Confluent / Kafka members will encounter in order to support option 3. in the future.

+1
source

Write messages to a local log file on disk. Write a small daemon that reads the log file and passes events to the main kafka daemon.

To increase throughput and limit the effect of latency, you can also rotate the log file every minute. Then rsync log file with crown to your main kafka region. Let the import daemon work there.

+1
source

Source: https://habr.com/ru/post/1259173/


All Articles