I'm trying to configure Flume to run logs hourly or when they reach the default block size of HDFS (64 MB). Below is my current configuration:
imp-agent.channels.imp-ch1.type = memory imp-agent.channels.imp-ch1.capacity = 40000 imp-agent.channels.imp-ch1.transactionCapacity = 1000 imp-agent.sources.avro-imp-source1.channels = imp-ch1 imp-agent.sources.avro-imp-source1.type = avro imp-agent.sources.avro-imp-source1.bind = 0.0.0.0 imp-agent.sources.avro-imp-source1.port = 41414 imp-agent.sources.avro-imp-source1.interceptors = host1 timestamp1 imp-agent.sources.avro-imp-source1.interceptors.host1.type = host imp-agent.sources.avro-imp-source1.interceptors.host1.useIP = false imp-agent.sources.avro-imp-source1.interceptors.timestamp1.type = timestamp imp-agent.sinks.hdfs-imp-sink1.channel = imp-ch1 imp-agent.sinks.hdfs-imp-sink1.type = hdfs imp-agent.sinks.hdfs-imp-sink1.hdfs.path = hdfs://mynamenode:8020/flume/impressions/yr=%Y/mo=%m/d=%d/logger=%{host}s1/ imp-agent.sinks.hdfs-imp-sink1.hdfs.filePrefix = Impr imp-agent.sinks.hdfs-imp-sink1.hdfs.batchSize = 10 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollInterval = 3600 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollCount = 0 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollSize = 66584576 imp-agent.channels = imp-ch1 imp-agent.sources = avro-imp-source1 imp-agent.sinks = hdfs-imp-sink1
My intention with the above configuration is to write to HDFS in batches of 10, and then roll the file written hourly. I see that all data is stored in memory until I reach 64 MB, until the files turn over after 1 hour. Are there any settings that I have to configure in order to achieve the desired behavior?
source share