Spark Streaming Window Operation

Below is a simple code that allows you to calculate the number of words by the window size of 30 seconds and the slide size of 10 seconds.

import org.apache.spark.SparkConf import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ import org.apache.spark.api.java.function._ import org.apache.spark.streaming.api._ import org.apache.spark.storage.StorageLevel val ssc = new StreamingContext(sc, Seconds(5)) // read from text file val lines0 = ssc.textFileStream("test") val words0 = lines0.flatMap(_.split(" ")) // read from socket val lines1 = ssc.socketTextStream("localhost", 9999, StorageLevel.MEMORY_AND_DISK_SER) val words1 = lines1.flatMap(_.split(" ")) val words = words0.union(words1) val wordCounts = words.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) wordCounts.print() ssc.checkpoint(".") ssc.start() ssc.awaitTermination() 

However, I get an error from this line:

 val wordCounts = words.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(30), Seconds(10)) 

. Especially from _ + _ . Error

 51: error: missing parameter type for expanded function ((x$2, x$3) => x$2.$plus(x$3)) 

Can someone tell me what the problem is? Thanks!

+6
source share
1 answer

This is very easy to fix, just specify the types. val wordCounts = words.map((_, 1)).reduceByKeyAndWindow((a:Int,b:Int)=>a+b, Seconds(30), Seconds(10))

Reason scala cannot deduce that type in this case is explained in this answer

+10
source

Source: https://habr.com/ru/post/972661/


All Articles