Sparks of the output signal to the kafka exactly once

Question

Sparks of the output signal to the kafka exactly once

I want to bring spark and sparking to kafka exactly once. But, as the dock says, "Output operations (for example, foreachRDD) have at least once semantics, that is, converted data can be written to an external object more than once in the event of a worker failure.".
To perform transactional updates, the spark recommends using the batch time (available in foreachRDD) and the RDD section index to create the identifier. This identifier uniquely identifies blob data in a streaming application. Code below:

dstream.foreachRDD { (rdd, time) =>
  rdd.foreachPartition { partitionIterator =>
    val partitionId = TaskContext.get.partitionId()
    val **uniqueId** = generateUniqueId(time.milliseconds, partitionId)
    // use this uniqueId to transactionally commit the data in  partitionIterator
  }
}

But how to use uniqueId in kafka to make transactional transactions.

thank

+4

scala apache-spark apache-kafka

bforevdr 07 . '16 2:31

1

codeaperature · Answer 1 · 2017-12-29T11:20:34+0000

Kafka Spark Summit , - Kixer. , .

Confluent 2016 , . http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/. - http://koeninger.imtqy.com/kafka-exactly-once/#1, github ( ) https://github.com/koeninger/kafka-exactly-once. , .

Kafka Kafka, Spark, , , Spark.

Sparks of the output signal to the kafka exactly once

More articles: