Using rank () in Spark SQL

Need to use some pointers when using rank()

I retrieved a column from the dataset .. to do the ranking.

Dataset<Row> inputCol= inputDataset.apply("Colname");    
Dataset<Row>  DSColAwithIndex=inputDSAAcolonly.withColumn("df1Rank", rank());

DSColAwithIndex.show();

I can sort the column and then add the index column to get the rank ... but curious to know the syntax and usage rank()

+4
source share
2 answers

Window specification should be indicated for rank()

val w = org.apache.spark.sql.expressions.Window.orderBy("date") //some spec    

val leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w))

Edit: Java version of the answer since OP using Java

import org.apache.spark.sql.expressions.WindowSpec; 
WindowSpec w = org.apache.spark.sql.expressions.Window.orderBy(colName);
Dataset<Row> leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w));
+2
source

I searched for this by applying a rating to my data frame in Java.

Using the answer in the comment above,

import org.apache.spark.sql.expressions.WindowSpec; 
WindowSpec w = org.apache.spark.sql.expressions.Window.orderBy(colName);
Dataset<Row> leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w));

worked for me, thanks gaurav.

0

Source: https://habr.com/ru/post/1671561/


All Articles