We began to consolidate event log data from our applications by posting in the Kafka thread. Although we could write directly from the application in Kafka, we decided to consider it as a general problem and use the Flume agent. This provides some flexibility: if we want to grab something else from the server, we can simply drive another source and post it to another Kafka topic.
We created a conf file for the Flume agent to record the journal and publish it in the Kafka theme:
tier1.sources = source1
tier1.channels = channel1
tier1.sinks = sink1
tier1.sources.source1.type = exec
tier1.sources.source1.command = tail -F /var/log/some_log.log
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type = memory
tier1.channels.channel1.capacity = 10000
tier1.channels.channel1.transactionCapacity = 1000
tier1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.topic = some_log
tier1.sinks.sink1.brokerList = hadoop01:9092,hadoop02.com:9092,hadoop03.com:9092
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.batchSize = 20
Unfortunately, the messages themselves do not indicate the host that created them. If we have an application running on several hosts and an error occurs, we cannot understand which host created the message.
, Flume HDFS, Flume HDFS. , , - Kafka, .. , . .
Flume / , ?