Why Spark reduceByKey result is incompatible

I am trying to count the number of iterations of each row using a spark using scala.
Below is my input:

1 Vikram
2 sachin
3 shobit
4 alok
5 akul
5 akul
1 Vikram
1 Vikram
3 shobit
10 ashu
5 akul
1 Vikram
2 sachin
7 Vikram

I now create 2 separate RDDs as follows.

val f1 = sc.textFile("hdfs:///path to above data file")
val m1 = f1.map( s => (s.split(" ")(0),1) ) //creating a tuple (key,1)
//now if i create a RDD as
val rd1 = m1.reduceByKey((a,b) => a+b )
rd1.collect().foreach(println)
//I get a proper output i.e (it gives correct output every time)
//output: (4,1) (2,2) (7,1) (5,3) (3,2) (1,4) (10,1)

//but if i create a RDD as
val rd2 = m1.reduceByKey((a,b) => a+1 )
rd2.collect().foreach(println)
//I get a inconsistent result i.e some times i get this (WRONG)
//output: (4,1) (2,2) (7,1) (5,2) (3,2) (1,2) (10,1)
//and sometimes I get this as output (CORRECT)
//output: (4,1) (2,2) (7,1) (5,3) (3,2) (1,4) (10,1) 

I can’t understand why this is happening and where to use what. I also tried creating RDD as

val m2 = f1.map(s => (s,1))
val rd3 = m2.reduceByKey((a,b) => a+1 )
// Then also same issue occurs with a+1 but every thing works fine with a+b
+4
source share
2 answers

reduceByKey , ( docs ). - (a, b) => a + b , (a, b) => a+1 .

? - - reduceByKey , . , b 1, a+1 .

: 4 , :

(aa, 1)
(aa, 1)

(aa, 1)
(cc, 1)

reduceByKey(f) :

val intermediate1 = f((aa, 1), (aa, 1)) 
val intermediate2 = f((aa, 1), (cc, 1))

val result = f(intermediate2, intermediate1)

f = (a, b) => a + b

val intermediate1 = f((aa, 1), (aa, 1))       // (aa, 2)
val intermediate2 = f((aa, 1), (cc, 1))       // (aa, 1), (cc, 1)

val result = f(intermediate2, intermediate1)  // (aa, 3), (cc, 1)

f = (a, b) => a + 1:

val intermediate1 = f((aa, 1), (bb, 1))       // (aa, 2)
val intermediate2 = f((aa, 1), (cc, 1))       // (aa, 1), (cc, 1)

// this is where it goes wrong:
val result = f(intermediate2, intermediate1)  // (aa, 2), (cc, 1)

- , , .

+7

(a, b) = > (a + 1) . :

f(a ,f(b , c)) = f(f(a , b), c) 

, :

a = (x, 1)
b = (x, 1)
c = (x, 1)

(a, b) = > (a + 1)

f(a ,f(b , c)) = (x , 2)

,

f(f(a , b), c) = (x , 3)

, reduceByKey.

+2

Source: https://habr.com/ru/post/1653798/


All Articles