In Spark 1.6 with StreamingContext I could use the reduceByKeyAndWindow function
events .mapToPair(x-> new Tuple2<String,MyPojo>(x.getID(),x)) .reduceByKeyAndWindow((a, b) -> a.getQuality() > b.getQuality() ? a : b , Durations.seconds(properties.getWindowLenght()), Durations.seconds(properties.getSlidingWindow())) .map(y->y._2);
Now I am trying to reproduce this logic using spark 2.0.2 and Dataframes. I was able to reproduce the missing reduceByKey function, but without a window
events .groupByKey(x-> x.getID() ,Encoders.STRING()) .reduceGroups((a,b)-> a.getQuality()>=b.getQuality() ? a : b) .map(x->x._2, Encoders.bean(MyPojo.class))
I managed to implement a window with groupBy
events .groupBy(functions.window(col("timeStamp"), "10 minutes", "5 minutes"),col("id")) .max("quality") .join(events, "id");
When I used groupBy, I got only two columns from 15, so I tried to return them using join, but then I got excpetion: join between two streaming DataFrames/Datasets is not supported;
Is there a way to reproduce the reduceByKeyAndWindow logic in spark 2?
source share