ReduceByKeyAndWindow in spark2 flash frames

In Spark 1.6 with StreamingContext I could use the reduceByKeyAndWindow function

  events .mapToPair(x-> new Tuple2<String,MyPojo>(x.getID(),x)) .reduceByKeyAndWindow((a, b) -> a.getQuality() > b.getQuality() ? a : b , Durations.seconds(properties.getWindowLenght()), Durations.seconds(properties.getSlidingWindow())) .map(y->y._2); 

Now I am trying to reproduce this logic using spark 2.0.2 and Dataframes. I was able to reproduce the missing reduceByKey function, but without a window

  events .groupByKey(x-> x.getID() ,Encoders.STRING()) .reduceGroups((a,b)-> a.getQuality()>=b.getQuality() ? a : b) .map(x->x._2, Encoders.bean(MyPojo.class)) 

I managed to implement a window with groupBy

  events .groupBy(functions.window(col("timeStamp"), "10 minutes", "5 minutes"),col("id")) .max("quality") .join(events, "id"); 

When I used groupBy, I got only two columns from 15, so I tried to return them using join, but then I got excpetion: join between two streaming DataFrames/Datasets is not supported;

Is there a way to reproduce the reduceByKeyAndWindow logic in spark 2?

+5
source share

Source: https://habr.com/ru/post/1261141/


All Articles