Why is the Flink-NG HDFS receiver not written to the file if the number of events equals or exceeds the packet size?

I'm trying to configure Flume to run logs hourly or when they reach the default block size of HDFS (64 MB). Below is my current configuration:

imp-agent.channels.imp-ch1.type = memory imp-agent.channels.imp-ch1.capacity = 40000 imp-agent.channels.imp-ch1.transactionCapacity = 1000 imp-agent.sources.avro-imp-source1.channels = imp-ch1 imp-agent.sources.avro-imp-source1.type = avro imp-agent.sources.avro-imp-source1.bind = 0.0.0.0 imp-agent.sources.avro-imp-source1.port = 41414 imp-agent.sources.avro-imp-source1.interceptors = host1 timestamp1 imp-agent.sources.avro-imp-source1.interceptors.host1.type = host imp-agent.sources.avro-imp-source1.interceptors.host1.useIP = false imp-agent.sources.avro-imp-source1.interceptors.timestamp1.type = timestamp imp-agent.sinks.hdfs-imp-sink1.channel = imp-ch1 imp-agent.sinks.hdfs-imp-sink1.type = hdfs imp-agent.sinks.hdfs-imp-sink1.hdfs.path = hdfs://mynamenode:8020/flume/impressions/yr=%Y/mo=%m/d=%d/logger=%{host}s1/ imp-agent.sinks.hdfs-imp-sink1.hdfs.filePrefix = Impr imp-agent.sinks.hdfs-imp-sink1.hdfs.batchSize = 10 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollInterval = 3600 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollCount = 0 imp-agent.sinks.hdfs-imp-sink1.hdfs.rollSize = 66584576 imp-agent.channels = imp-ch1 imp-agent.sources = avro-imp-source1 imp-agent.sinks = hdfs-imp-sink1 

My intention with the above configuration is to write to HDFS in batches of 10, and then roll the file written hourly. I see that all data is stored in memory until I reach 64 MB, until the files turn over after 1 hour. Are there any settings that I have to configure in order to achieve the desired behavior?

+4
source share
1 answer

To answer the question, Flume writes data to HDFS in batches. The file length is reported as open since the block is written to.

0
source

Source: https://habr.com/ru/post/1480765/


All Articles