We have a .NET client application that uploads files to S3. An event notification is logged in the bucket that runs Lambda to process the file. If we need to perform maintenance, we suspend our processing by deleting the event notification and adding it later when we are ready to resume processing.
To handle the backlog of files queuing on S3 during the period when the event notification was turned off, we write a record to the kinesi stream with the S3 key for each file, and we have an event mapping that allows Lambda to consume each kinesis record. This works great for us because it allows us to control our concurrency when we handle a large lag by controlling the number of fragments in the stream. We initially used SNS, but when we had thousands of files that needed to be processed, SNS would continue to run Lambdas until we reached the parallel execution threshold, so we switched to Kinesis.
The problem that we are facing now is that the cost of kinesis kills us, although we hardly use it. We get 150 to 200 files downloaded per minute, and our lambda takes about 15 seconds to process each of them. If we pause processing for several hours, we will receive thousands of files for processing. We could easily recycle them using a 128-point stream, however it would cost $ 1,400 a month. The current cost of launching our Lambda every month is less than $ 300. It seems terrible that we have to increase our COGS by 400% in order to be able to control the level of concurrency during the recovery scenario.
I could try to reduce the default size of the stream, and then resize it on the fly before we handle the big lag, however, resizing the stream from 1st to 128 takes a lot of time. If we are trying to recover from an unplanned outage, we cannot afford to sit waiting for the thread to resize before we can use it. So my questions are:
Can someone recommend an alternative pattern for using kinesia fragments in order to be able to control the upper bound of the number of parallel lambdas flowing down the queue?
Is there something I don’t see that will allow us to use Kinesis more efficiently?