I am using spark 0.91 with MLlib 0.91 in DSE
When you try to run the following code offline
val parsedData = sc.parallelize((1 to 1000). map { line => LabeledPoint(0.0, Array(0.0, 0.4, 0.3)) }) val numIterations = 2 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
I get this error:
14/09/20 14:28:37 ERROR OneForOneStrategy: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix java.lang.ClassCastException: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150) at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150) at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:677) at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:674) at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:846) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:601)
This only happens when you try to start a stand-alone application. It works on a spark shell (dse spark). Any ideas?
Update:
When I create an object in REPL, getClassLoader returns:
scala> new org.jblas.DoubleMatrix().getClass().getClassLoader() res3: ClassLoader = ModuleClassLoader:Analytics
But when I start offline mode (with spark class), it returns
new org.jblas.DoubleMatrix().getClass().getClassLoader(): class= SystemClassLoader
Perhaps this is a clue.
I use SBT to create a jar and ship it using a spark class. Here is the configuration
name := "analytics" version := "1.0" scalaVersion := "2.10.3" unmanagedJars in Compile ++= Attributed.blankSeq((file("./dse/lib/") * "*.jar").get) unmanagedJars in Compile ++= Attributed.blankSeq((file("./dse/resources/spark/lib/") * "*.jar").get) unmanagedJars in Compile ++= Attributed.blankSeq((file("./dse/resources/cassandra/lib/") * "*.jar").get) unmanagedJars in Runtime ++= Attributed.blankSeq((file("./dse/resources/hadoop/") * "*.jar").get) unmanagedJars in Runtime ++= Attributed.blankSeq((file("./dse/resources/hadoop/lib/") * "*.jar").get) unmanagedJars in Compile ++= Attributed.blankSeq((file("./dse/resources/driver/lib/") * "*.jar").get)
Update 2: Used the configuration of the dse demos to build and deploy using ant, but again I encounter the same error
source share