ReduceByKeyAndWindow in spark2 flash frames

Question

ReduceByKeyAndWindow in spark2 flash frames

In Spark 1.6 with StreamingContext I could use the reduceByKeyAndWindow function

  events .mapToPair(x-> new Tuple2<String,MyPojo>(x.getID(),x)) .reduceByKeyAndWindow((a, b) -> a.getQuality() > b.getQuality() ? a : b , Durations.seconds(properties.getWindowLenght()), Durations.seconds(properties.getSlidingWindow())) .map(y->y._2);

Now I am trying to reproduce this logic using spark 2.0.2 and Dataframes. I was able to reproduce the missing reduceByKey function, but without a window

  events .groupByKey(x-> x.getID() ,Encoders.STRING()) .reduceGroups((a,b)-> a.getQuality()>=b.getQuality() ? a : b) .map(x->x._2, Encoders.bean(MyPojo.class))

I managed to implement a window with groupBy

  events .groupBy(functions.window(col("timeStamp"), "10 minutes", "5 minutes"),col("id")) .max("quality") .join(events, "id");

When I used groupBy, I got only two columns from 15, so I tried to return them using join, but then I got excpetion: join between two streaming DataFrames/Datasets is not supported;

Is there a way to reproduce the reduceByKeyAndWindow logic in spark 2?

+5

java scala apache-spark spark-dataframe apache-spark-2.0

Martin Brišiak Dec 12 '16 at 14:08

source share

No one has answered this question yet.

See related questions:

eight

Merge two DataFrames in Spark SQL and select only one column

3

Spark DataFrame for RDD and vice versa

2