I have the following spark assignment :
from __future__ import print_function import os import sys import time from random import random from operator import add from pyspark.streaming import StreamingContext from pyspark import SparkContext,SparkConf from pyspark.streaming.kafka import KafkaUtils from pyspark.sql import SQLContext, Row from pyspark.streaming import StreamingContext from pyspark_cassandra import streaming,CassandraSparkContext if __name__ == "__main__": conf = SparkConf().setAppName("PySpark Cassandra Test") sc = CassandraSparkContext(conf=conf) stream = StreamingContext(sc, 2) rdd=sc.cassandraTable("keyspace2","users").collect()
When I run this, it gives me the following error :
ERROR StreamingContext: Error starting the context, marking it as stopped java.lang.IllegalArgumentException: requirement failed: \ No output operations registered, so nothing to execute
shell script I run:
./bin/spark-submit --packages TargetHolding:pyspark-cassandra:0.2.4 example s/src/main/python/test/reading-cassandra.py
Comparing spark streams with kafka, I do not have this line in the code above:
kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "name", {'topic':1})
where I actually use createStream , but for cassandra I don't see anything like this in the docs. How to start streaming between spark stream and cassandra?
Versions
Cassandra v2.1.12 Spark v1.4.1 Scala 2.10
source share