I use two Jupyter laptops to do different things in analysis. In my Scala laptop, I write some of my cleared data to the parquet:
partitionedDF.select("noStopWords","lowerText","prediction").write.save("swift2d://xxxx.keystone/commentClusters.parquet")
Then I move on to my Python notebook for reading in the data:
df = spark.read.load("swift2d://xxxx.keystone/commentClusters.parquet")
and I get the following error:
AnalysisException: u'Unable to infer schema for ParquetFormat at swift2d://RedditTextAnalysis.keystone/commentClusters.parquet. It must be specified manually;'
I looked through the documentation for spark discharges, and I don’t think I would need to specify a circuit. Has anyone come across something similar? Do I have to do something else when I save / load? Data lands in the object storage.
edit: I sing spark 2.0 in both readings and recordings.
edit2: This was done in the Data Experience project.