Spark - Scala - saveAhadoopFile throwing error

Question

Spark - Scala - saveAhadoopFile throwing error

I would like to fix the problem, but could not move on. Can anybody help

import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat class KeyBasedOutput[T >: Null, V <: AnyRef] extends MultipleTextOutputFormat[T , V] { override def generateFileNameForKeyValue(key: T, value: V, leaf: String) = { key.toString } override def generateActualKey(key: T, value: V) = { null } } val cp1 =sqlContext.sql("select * from d_prev_fact").map(t => t.mkString("\t")).map{x => val parts = x.split("\t") val partition_key = parts(3) val rows = parts.slice(0, parts.length).mkString("\t") ("date=" + partition_key.toString, rows.toString)} cp1.saveAsHadoopFile(FACT_CP)

I have an error as shown below and cannot debug

 scala> cp1.saveAsHadoopFile(FACT_CP,classOf[String],classOf[String],classOf[KeyBasedOutput[String, String]]) java.lang.RuntimeException: java.lang.NoSuchMethodException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$KeyBasedOutput.<init>() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:709) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:742) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:674)

The idea is to write values to several folders based on the key

+3

scala apache-spark

VB_ Sep 23 '14 at 1:47

source share

2 answers

x4444 · Answer 1 · 2016-07-05 20:56

Put KeyBasedOutput in the jar and launch the spark shell - jars / path / to / the / jar

reggert · Answer 2 · 2014-09-23 16:33

I'm not sure, but I think that erasing styles combined with reflection can cause this problem for you. Try defining a non-general subclass of KeyBasedOutput that hardcodes the type parameters and uses it.

 class StringKeyBasedOutput extends KeyBasedOutput[String, String]

Spark - Scala - saveAhadoopFile throwing error

More articles: