Migrating a log file to HDFS using Flume while writing it

Question

Migrating a log file to HDFS using Flume while writing it

What is the best way to put a log file in HDFS while it is being written? I am trying to tune Apache Flume and I am trying to tune sources that can offer me data reliability. I tried to configure "exec" and then also looked at "spooldir", but the following documentation on flume.apache.org cast doubt on my own intent -

Source Exec:

One of the most frequently requested functions is the use case, " tail -F file_name ", where the application writes to a log file on disk and Flume processes the file, sending each line as an event. Although possible, this is an obvious problem; What happens if the channel is full and Flume can not send an event? Flume does not have a pointer to the application that writes the log file that it needs to save the log or that the event was not sent for any reason. Your application can never guarantee data retrieval when using a unidirectional asynchronous interface such as ExecSource!

Buffer Directory Source:

Unlike the Exec source, the spooldir source is reliable and will not skip data, even if Flume is restarted or killed. In exchange for this reliability, only immutable files should be flushed to the buffered directory. If the file is written after it is placed in the spooling directory, Flume will output an error to the log file and stop processing.

Is there anything better I can use to prevent Flume from missing any events and also reading in real time?

+4

real-time reliability flume

sgsi 21 sept '15 at 21:16

source share

1 answer

frb · Accepted Answer · 2015-09-23T07:27:45+0000

- . , , ( ), .

Migrating a log file to HDFS using Flume while writing it

More articles: