What is the meaning of reduceByKey (_ ++ _)

I recently had a script to store data in a keyValue Pair and came across a reduceByKey(_ ++ _) function. It is rather a shorthand syntax. I can’t understand what this really means.

Example: reduceBykey(_ + _) means reduceByKey((a,b)=>(a+b))

So reduceByKey(_ ++ _) means

I can create a pair of key values ​​from data using reduceByKey(_ ++ _) .

 val y = sc.textFile("file:///root/My_Spark_learning/reduced.txt") y.map(value=>value.split(",")) .map(value=>(value(0),value(1),value(2))) .collect .foreach(println) (1,2,3) (1,3,4) (4,5,6) (7,8,9) y.map(value=>value.split(",")) .map(value=>(value(0),Seq(value(1),value(2)))) .reduceByKey(_ ++ _) .collect .foreach(println) (1,List(2, 3, 3, 4)) (4,List(5, 6)) (7,List(8, 9)) 
+5
source share
2 answers

reduceByKey(_ ++ _) translates to reduceByKey((a,b) => a ++ b) .

++ is a method defined in List that combines another list into it.

So, for key 1 in the sample data, a will be List(2,3) , and b will be List(3,4) , and, therefore, the concatenation of List(2,3) and List(3,4) ( List(2,3) ++ List(3,4) ) will give List(2,3,3,4) .

+4
source

reduceByKey(_ ++ _) equivalent to reduceByKey((x,y)=> x ++ y) reduceByKey takes two parameters, applies a function, and returns

First, it breaks the set, and ++ simply combines the collections, combining the elements of both sets.

For each key, it is saved in the list. In your case, 1 as the key x will be List(2,3) , and y will be List (3,4) and ++ will add both as List (2,3,3,4)

If you have a different value, for example (1,4,5) , then x will be List(4,5) in this case and y should be List (2,3,3,4) , and the result will be List(2,3,3,4,4,5)

+1
source

Source: https://habr.com/ru/post/1268121/


All Articles