How to sort data in a spark stream

I am new to spark and will try to write a sample code base on spark and spark flow.

So far, I have implemented the function of sorting into sparks, here is the code:

  def sort(listSize: Int, slice: Int): Unit = {
    val conf = new SparkConf().setAppName(getClass.getName)
    val spark = new SparkContext(conf)
    val data = genRandom(listSize)
    val distData = spark.parallelize(data, slice)
    val result = distData.sortBy(x => x, true)
    val finalResult = result.collect()
    val step5 = System.currentTimeMillis()
    printlnArray(finalResult, 0, 10)
    spark.stop()
  }

  /**
   * generate random number
   * @return
   */
  def genRandom(listSize: Int): List[Int] = {
    val range = 100000
    var listBuffer = new ListBuffer[Int]
    val random = new Random()
    for (i <- 1 to listSize) listBuffer += random.nextInt(range)
    listBuffer.toList
  }

  def printlnArray(list: Array[Int], start: Int, offset: Int) {
    for (i <- start until start + offset) println(">>>>>>>>> list : " + i + " | " + list(i))
  }

I had a problem with the implementation of the sort function in a spark stream. As I know, RDD spark provides API sorting in the spark core, but there is no such API in the spark stream. Does anyone know how to do this? Thanks

This is a dumping question, but after google on the Internet, I cannot find the correct answer. If someone knows how to solve it, thanks for your help.

+4
source share
1 answer

You can use the DStream conversion function to convert it using basic RDDs.

for instance

myDStream.transform(rdd=>rdd.sortByKey())
+4

Source: https://habr.com/ru/post/1570124/


All Articles