Non-deterministic functions in streaming processing

Some systems, such as StreamScope, require that the functions be deterministic (as well as the order in which they are processed), this is due to the fact that each message has its own sequence number when in the stream. In the event of a failure, this sequence number is used to determine whether the event should be re-read or not (because it was stored in the stream), and therefore the downstream nodes do not calculate the same events twice.

Are there any Flink, Spark Streaming, Kafka-Streams, and Storm functions so that the functions are deterministic?

+4
source share
1 answer

Yes and no. It depends;)

.

. . , ( ), . , , , .

, , .

Flink/Storm (Trident)/ ( ):

  • Flink exaccty- .
  • ,
    • , , . , , , , , ( ...)
  • Micro-batching Spark/Storm ( , , Flink )

Flink/Storm/Kafka-Streams ( ):

  • , . , , "" (- ).
+3

Source: https://habr.com/ru/post/1661726/


All Articles