I have a spark cluster of two machines, and when I launch the spark streaming application, I get the following errors:
Exception in thread "main" org.apache.spark.SparkException: Checkpoint RDD ReliableCheckpointRDD[11] at print at StatefulNetworkWordCount.scala:78(1) has different number of partitions from original RDD MapPartitionsRDD[10] at updateStateByKey at StatefulNetworkWordCount.scala:76(2)
at org.apache.spark.rdd.ReliableRDDCheckpointData.doCheckpoint(ReliableRDDCheckpointData.scala:73)
at org.apache.spark.rdd.RDDCheckpointData.checkpoint(RDDCheckpointData.scala:74)
How can I provide a checkpoint directory on a file system that is not HDFS / Cassandra / any other data store?
I thought of two possible solutions, but I don't know how to encode them:
Any suggestions?
source
share