How to enable streaming from Cassandra to Spark?

I have the following spark assignment :

from __future__ import print_function import os import sys import time from random import random from operator import add from pyspark.streaming import StreamingContext from pyspark import SparkContext,SparkConf from pyspark.streaming.kafka import KafkaUtils from pyspark.sql import SQLContext, Row from pyspark.streaming import StreamingContext from pyspark_cassandra import streaming,CassandraSparkContext if __name__ == "__main__": conf = SparkConf().setAppName("PySpark Cassandra Test") sc = CassandraSparkContext(conf=conf) stream = StreamingContext(sc, 2) rdd=sc.cassandraTable("keyspace2","users").collect() #print rdd stream.start() stream.awaitTermination() sc.stop() 

When I run this, it gives me the following error :

 ERROR StreamingContext: Error starting the context, marking it as stopped java.lang.IllegalArgumentException: requirement failed: \ No output operations registered, so nothing to execute 

shell script I run:

 ./bin/spark-submit --packages TargetHolding:pyspark-cassandra:0.2.4 example s/src/main/python/test/reading-cassandra.py 

Comparing spark streams with kafka, I do not have this line in the code above:

 kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "name", {'topic':1}) 

where I actually use createStream , but for cassandra I don't see anything like this in the docs. How to start streaming between spark stream and cassandra?

Versions

 Cassandra v2.1.12 Spark v1.4.1 Scala 2.10 
+5
source share
1 answer

To create a DStream from a Cassandra table, you can use ConstantInputDStream , providing the RDD created from the Cassandra table as input. This will cause RDD to be implemented on every DStream interval.

Be careful that large tables or tables that are constantly growing in size will adversely affect the performance of your streaming work.

See also: Reading from Cassandra using Spark Streaming for an example.

0
source

Source: https://habr.com/ru/post/1241490/


All Articles