Sparks sql window function

I am looking at the window slide function for a Spark DataFrame in Spark SQL, Scala.

I have a dataframe with columns Col1, Col1, Col1, date.

Col1 Col2 Col3 date volume new_col 201601 100.5 201602 120.6 100.5 201603 450.2 120.6 201604 200.7 450.2 201605 121.4 200.7` 

Now I want to add a new column with the name (new_col) with one row, as shown above.

I tried using the option below to use the window function.

 val windSldBrdrxNrx_df = df.withColumn("Prev_brand_rx", lag("Prev_brand_rx",1)) 

Can anyone help me how to do this.

+12
source share
2 answers

You are doing everything you miss correctly, over(window expression) on lag

 val df = sc.parallelize(Seq((201601, 100.5), (201602, 120.6), (201603, 450.2), (201604, 200.7), (201605, 121.4))).toDF("date", "volume") val w = org.apache.spark.sql.expressions.Window.orderBy("date") import org.apache.spark.sql.functions.lag val leadDf = df.withColumn("new_col", lag("volume", 1, 0).over(w)) leadDf.show() +------+------+-------+ | date|volume|new_col| +------+------+-------+ |201601| 100.5| 0.0| |201602| 120.6| 100.5| |201603| 450.2| 120.6| |201604| 200.7| 450.2| |201605| 121.4| 200.7| +------+------+-------+ 

This code was run on Spark shell 2.0.2

+19
source

You can import below two packages that will solve the problem with the delay.

 import org.apache.spark.sql.functions.{lead, lag} import org.apache.spark.sql.expressions.Window 
+2
source

Source: https://habr.com/ru/post/1261308/


All Articles