I am studying AWS Kinesis for a data processing requirement that replaces the processing of an old batch ETL with a streaming approach.
One of the key requirements for this project is the ability to process data in cases where
- An error has been detected and fixed, and the application has been redistributed. Data needs to be processed from the very beginning.
- New features have been added, and the story should be fully or partially reworked.
The scripts are very well described here for Kafka - https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios .
I saw the ShardIterator timestamp in Kinesis, and I think that a Kafka tool like a reset utility can be built using the Kinesis API, but it would be great if something like that already existed. Even if it is not, it would be good to learn from those who have solved such problems.
So, does anyone know about the existing resources, templates and tools available for this in Kinesis?
source
share