SortByKey in Spark

Question

SortByKey in Spark

New to Spark and Scala. Trying to sort a word count example . My code is based on this simple example . I want to sort the results alphabetically using a key. If I add key sorting to RDD:

 val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

then I get a compilation error:

error: No implicit view available from java.io.Serializable => Ordered[java.io.Serializable].
[INFO]     val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

I do not know what the absence of an implicit representation means. Can someone tell me how to fix this? I am running Cloudera 5 Quickstart VM. I think it integrates Spark version 0.9.

Scala Job Source

import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SparkWordCount {
  def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))

    val files = sc.textFile(args(0)).map(_.split(","))

    def f(x:Array[String]) = {
      if (x.length > 3)
        x(3)
      else
        Array("NO NAME")
   }

    val names = files.map(f)

    val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

    System.out.println(wordCounts.collect().mkString("\n"))
  }
}

Some (unsorted) output

("INTERNATIONAL EYELETS INC",879)
("SHAQUITA SALLEY",865)
("PAZ DURIGA",791)
("TERESSA ALCARAZ",824)
("MING CHAIX",878)
("JACKSON SHIELDS YEISER",837)
("AUDRY HULLINGER",875)
("GABRIELLE MOLANDS",802)
("TAM TACKER",775)
("HYACINTH VITELA",837)

+4

scala apache-spark

ahoffer Jun 10 '14 at 18:33

source share

1 answer

aaronman · Accepted Answer · 2014-06-10T19:19:48+0000

No implicit representation means there is no scala function like the one specified

implicit def SerializableToOrdered(x :java.io.Serializable) = new Ordered[java.io.Serializable](x) //note this function doesn't work

, , , - java.io.Serializable( Array [String]). reduceByKey , Orderable. :

object SparkWordCount {
  def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))

    val files = sc.textFile(args(0)).map(_.split(","))

    def f(x:Array[String]) = {
      if (x.length > 3)
        x(3)
      else
        "NO NAME"
    }

    val names = files.map(f)

    val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()

    System.out.println(wordCounts.collect().mkString("\n"))
  }
}

SortByKey in Spark

Scala Job Source

Some (unsorted) output

More articles: