Flume HDFS: remove timestamp from file name

I configured a flume agent for my application where the source is Spooldir and the receiver is HDFS

I can collect files in hdfs.

agent configuration:

agent.sources = src-1 agent.channels = c1 agent.sinks = k1 agent.sources.src-1.type = spooldir agent.sources.src-1.channels = c1 agent.sources.src-1.spoolDir = /home/Documents/id/ agent.sources.src-1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder agent.sources.src-1.fileHeader=true agent.channels.c1.type = file agent.sources.src-1.basenameHeader=true agent.sources.src-1.basenameHeaderKey=basename agent.sinks.k1.type = hdfs agent.sinks.k1.channel = c1 agent.sinks.k1.hdfs.path =hdfs://localhost:8020/user/flume/events/ agent.sinks.k1.hdfs.filePrefix = %{basename} agent.sinks.k1.hdfs.fileHeader = true agent.sinks.k1.hdfs.fileType = DataStream 

I have hdfs files as below:

/flume/events/file1.txt.1411543838171 / flume / events / file 2.txt.1411544272696

I want to know. Can I delete the timestamp (1411543838171) / unique number that is automatically generated for each event for the file name?

+5
source share
1 answer

It seems that it is not possible to remove the timestamp using configuration. If you look at how HDFS Sink works, you will find the following:

 long counter = fileExtensionCounter.incrementAndGet(); String fullFileName = fileName + "." + counter; 

Where fileExtensionCounter is fileExtensionCounter = new AtomicLong(clock.currentTimeMillis());

You can check the code for the sink here and here for the author.

If you want to do more events in one file, you can look at the properties of the shell

  • rollTime
  • rollSize
  • rollCount
  • batchSize
+2
source

Source: https://habr.com/ru/post/1236398/


All Articles