the solution is similar to Get the top n in each DataFrame group in pyspark , which is in pyspark
If you do the same in scala, then it should be lower
df.withColumn("rank", rank().over(Window.partitionBy("Dept_id").orderBy($"salary".desc)))
.filter($"rank" <= 3)
.drop("rank")
I hope the answer is helpful
source
share