How to convert Timestamp format to Date in DataFrame?

I have a DataFrame with a Timestamp column that I need to convert to Date format.

Are there any Spark SQL functions for this?

+21
source share
4 answers

You can cast specify a column:

Scala:

 import org.apache.spark.sql.types.DateType val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType)) 

Pyspark:

 df = df.withColumn('dateColumn', df['timestampColumn'].cast('date')) 
+47
source

In SparkSQL:

 SELECT CAST(the_ts AS DATE) AS the_date FROM the_table 
+13
source

Imagine the following input:

 val dataIn = spark.createDataFrame(Seq( (1, "some data"), (2, "more data"))) .toDF("id", "stuff") .withColumn("ts", current_timestamp()) dataIn.printSchema root |-- id: integer (nullable = false) |-- stuff: string (nullable = true) |-- ts: timestamp (nullable = false) 

You can use the to_date function:

 val dataOut = dataIn.withColumn("date", to_date($"ts")) dataOut.printSchema root |-- id: integer (nullable = false) |-- stuff: string (nullable = true) |-- ts: timestamp (nullable = false) |-- date: date (nullable = false) dataOut.show(false) +---+---------+-----------------------+----------+ |id |stuff |ts |date | +---+---------+-----------------------+----------+ |1 |some data|2017-11-21 16:37:15.828|2017-11-21| |2 |more data|2017-11-21 16:37:15.828|2017-11-21| +---+---------+-----------------------+----------+ 

I would recommend using these methods for casting and plain SQL.

+4
source

For Spark 2.4+,

 import spark.implicits._ val newDF = df.withColumn("dateColumn", $"timestampColumn".cast(DateType)) 

OR

 val newDF = df.withColumn("dateColumn", col("timestampColumn").cast(DateType)) 
0
source

Source: https://habr.com/ru/post/1259907/


All Articles