This is not an answer to the question, however I would like to clarify the idea of storing records in accordance with the time of arrival of the event.
First, a few words about streams. Kinesis is just a data stream. And he has a concept of consumption. You can reliably consume a stream only by reading it sequentially. And there is also the idea of checkpoints as a mechanism to suspend and resume the consumption process. A control point is simply a sequence number that identifies a position in a stream. By specifying this number, you can start reading the stream from a specific event.
And back to the default setting s3 firehose .... Since the capacity of the flow of kinesia is very limited, most likely you need to store data from kinesis somewhere in order to analyze it later. And installing firehose for s3 does it right out of the box. It just stores raw data from the stream into s3 buckets. But logically, this data represents the same stream of records. And in order to be able to reliably consume (read) this stream, we need these serial numbers for control points. And these numbers are the time of arrival of the records.
What if I want to read records by creation time? It seems that the right way to accomplish this task is to read the s3 stream sequentially, upload it to some database or time series of data, and make records based on the creation time, based on this storage. Otherwise, there will always be an unnecessary chance to skip several bundles of events when reading s3 (stream). Therefore, I would not suggest reordering s3 buckets at all.
source share