I use Kafka, and we have a precedent for creating a fault-tolerant system where not even a single message should be missed. So here is the problem: If publishing to Kafka fails for any reason (ZooKeeper down, Kafka broker down, etc.), how can we safely process these messages and play them as soon as everything is restored again. Again, as I said, we cannot afford even one message failure. In another case, we must also know at any given time how many messages could not be published to Kafka for any reason, that is, something like a counter function, and now these messages need to be re-published again.
One solution is to push these messages to some database (for example, Cassandra, where recordings are very fast, but we also need counter functions, and I believe that the Cassandra counter function is not so useful, and we we donβt want to use it). which can handle such a load, and also provide us with a counter that is very accurate.
This question is more related to the prospect of architecture, and then what technology to use to make this happen.
PS: We process some, for example, 3000TPS. Thus, if a system crashes, these failed messages can grow very quickly in a very short time. We use java based frameworks.
Thanks for your help!
Coder source share