Spark Streaming: Queued Microbatch

Question

Spark Streaming: Queued Microbatch

I tried to make an example Function window using pyspark.

conf = (SparkConf()
     .setMaster("local[2]")
     .setAppName("PythonStreamingKafka")
     )

sc = SparkContext(conf = conf)
ssc = StreamingContext(sc, 5)
input_stream = KafkaUtils.createDirectStream(ssc, [thing_topic], {"metadata.broker.list": brokers})

messages = input_stream.map(lambda x: x[1]).window(10,5)
messages.foreachRDD(windowMetrics)

def windowMetrics(rdd):
     print ("Window RDD size:", rdd.count())

Getting data from Kafka. After starting the launch, as you can see in the next Spark GUI, only one batch is executed, and all other batches are queued. In fact, processing does not occur inside the window, because kafka does not provide data for processing (InputSize = 0).

If window operations are not used, it works great. The code is below.

 messages = input_stream.map(lambda x: x[1])
 messages.foreachRDD(windowMetrics)

I tried some search queries and found some related content, and none of them had the same problem.

Spark Streaming: long lines / active parties

Here, intermediate batches are queued.

https://forums.databricks.com/questions/1276/kafka-direct-api-from-spark-streaming-what-happens.html

. , , .

+4

apache-spark pyspark spark-streaming

Tinto James 22 . '16 6:14

:

4

Spark Streaming: /

:

4

Spark Streaming: Micro-

3

Apache Kafka Spark Streaming

3

2.2.1 kafka 0.10.0.1

1

http Python Spark streaming

1

RDD

1

Spark Streaming: ?

1

JSON , Spark Streaming SQL?

0

KAFKA spark with a flow of processing time from the very beginning

0

Spark Streaming doesn't read Kafka threads

0

Spark 2.1 data loss -kafka broker 0.8.2.1 streaming

Spark Streaming: Queued Microbatch

More articles: