Spark, Scala - How to get Top 3 value from each group of two columns in dataframe

Question

Spark, Scala - How to get Top 3 value from each group of two columns in dataframe

I have one DataFrame that contains these values:

Dept_id  |  name  | salary
 1           A       10
 2           B       100
 1           D       100
 2           C       105
 1           N       103
 2           F       102
 1           K       90
 2           E       110

I want the result in this form:

Dept_id  |  name  | salary
 1           N       103
 1           D       100
 1           K       90
 2           E       110
 2           C       105 
 2           F       102

Thanks in advance :).

+4

scala apache-spark apache-spark-sql spark-dataframe

Learningner Aug 29 '17 at 6:40

source share

1 answer

Ramesh maharjan · Accepted Answer · 2017-08-29T07:42:03+0000

the solution is similar to Get the top n in each DataFrame group in pyspark , which is in pyspark

If you do the same in scala, then it should be lower

df.withColumn("rank", rank().over(Window.partitionBy("Dept_id").orderBy($"salary".desc)))
    .filter($"rank" <= 3)
    .drop("rank")

I hope the answer is helpful

Spark, Scala - How to get Top 3 value from each group of two columns in dataframe

More articles: