When i run the code
val home = "/Users/adremja/Documents/Kaggle/outbrain"
val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories take(10) foreach println
in the spark sheath it works great
scala> val home = "/Users/adremja/Documents/Kaggle/outbrain"
home: String = /Users/adremja/Documents/Kaggle/outbrain
scala> val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories: org.apache.spark.rdd.RDD[String] = /Users/adremja/Documents/Kaggle/outbrain/documents_categories.csv MapPartitionsRDD[21] at textFile at <console>:26
scala> documents_categories take(10) foreach println
document_id,category_id,confidence_level
1595802,1611,0.92
1595802,1610,0.07
1524246,1807,0.92
1524246,1608,0.07
1617787,1807,0.92
1617787,1608,0.07
1615583,1305,0.92
1615583,1806,0.07
1615460,1613,0.540646372
However, when I try to run in Zeppelin, I get an error
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:679)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:797)
... 46 elided
Do you have an idea where the problem is?
I have a spark 2.0.1 from a homegrown (I linked it in zeppelin-env.sh as SPARK_HOME) and Zeppelin 0.6.2 from the Zeppelin website.
source
share