Sparks sql window function

Question

Sparks sql window function

I am looking at the window slide function for a Spark DataFrame in Spark SQL, Scala.

I have a dataframe with columns Col1, Col1, Col1, date.

Col1 Col2 Col3 date volume new_col 201601 100.5 201602 120.6 100.5 201603 450.2 120.6 201604 200.7 450.2 201605 121.4 200.7`

Now I want to add a new column with the name (new_col) with one row, as shown above.

I tried using the option below to use the window function.

 val windSldBrdrxNrx_df = df.withColumn("Prev_brand_rx", lag("Prev_brand_rx",1))

Can anyone help me how to do this.

+12

scala apache-spark window-functions apache-spark-sql spark-dataframe

Ramsh Dec 15 '16 at 6:53

source share

2 answers

You can import below two packages that will solve the problem with the delay.

 import org.apache.spark.sql.functions.{lead, lag} import org.apache.spark.sql.expressions.Window

+2

Sampat kumar Sep 19 '17 at 14:34

source share

mrsrinivas · Accepted Answer · 2016-12-15T08:16:15+0000

You are doing everything you miss correctly, over(window expression) on lag

 val df = sc.parallelize(Seq((201601, 100.5), (201602, 120.6), (201603, 450.2), (201604, 200.7), (201605, 121.4))).toDF("date", "volume") val w = org.apache.spark.sql.expressions.Window.orderBy("date") import org.apache.spark.sql.functions.lag val leadDf = df.withColumn("new_col", lag("volume", 1, 0).over(w)) leadDf.show() +------+------+-------+ | date|volume|new_col| +------+------+-------+ |201601| 100.5| 0.0| |201602| 120.6| 100.5| |201603| 450.2| 120.6| |201604| 200.7| 450.2| |201605| 121.4| 200.7| +------+------+-------+

This code was run on Spark shell 2.0.2

Sparks sql window function

More articles: