Use stream to stream data to S3

I am trying to use the tray for something very simple, where I would like to pull the contents from my S3 log files. I managed to create a flume agent that will read the contents from the apache access log file and use the logger receiver. Now I am trying to find a solution where I can replace the sink for registration with the β€œS3 receiver”. (I know this does not exist by default)

I was looking for some pointers to guide me on the right path. Below is the file of my test properties that I am currently using.

a1.sources=src1 a1.sinks=sink1 a1.channels=ch1 #source configuration a1.sources.src1.type=exec a1.sources.src1.command=tail -f /var/log/apache2/access.log #sink configuration a1.sinks.sink1.type=logger #channel configuration a1.channels.ch1.type=memory a1.channels.ch1.capacity=1000 a1.channels.ch1.transactionCapacity=100 #links a1.sources.src1.channels=ch1 a1.sinks.sink1.channel=ch1 
+5
source share
2 answers

S3 is built on HDFS, so you can use an HDFS receiver , you must replace the hdfs path with your bucket this way. Remember to replace AWS_ACCESS_KEY and AWS_SECRET_KEY.

 agent.sinks.s3hdfs.type = hdfs agent.sinks.s3hdfs.hdfs.path = s3n://<AWS.ACCESS.KEY>:<AWS.SECRET.KEY>@<bucket.name>/prefix/ agent.sinks.s3hdfs.hdfs.fileType = DataStream agent.sinks.s3hdfs.hdfs.filePrefix = FilePrefix agent.sinks.s3hdfs.hdfs.writeFormat = Text agent.sinks.s3hdfs.hdfs.rollCount = 0 agent.sinks.s3hdfs.hdfs.rollSize = 67108864 #64Mb filesize agent.sinks.s3hdfs.hdfs.batchSize = 10000 agent.sinks.s3hdfs.hdfs.rollInterval = 0 
+14
source

This makes sense, but may rollSize this value followed

 agent_messaging.sinks.AWSS3.hdfs.round = true agent_messaging.sinks.AWSS3.hdfs.roundValue = 30 agent_messaging.sinks.AWSS3.hdfs.roundUnit = minute 
0
source

Source: https://habr.com/ru/post/1203344/


All Articles