Imagine the following input:
val dataIn = spark.createDataFrame(Seq( (1, "some data"), (2, "more data"))) .toDF("id", "stuff") .withColumn("ts", current_timestamp()) dataIn.printSchema root |-- id: integer (nullable = false) |-- stuff: string (nullable = true) |-- ts: timestamp (nullable = false)
You can use the to_date function:
val dataOut = dataIn.withColumn("date", to_date($"ts")) dataOut.printSchema root |-- id: integer (nullable = false) |-- stuff: string (nullable = true) |-- ts: timestamp (nullable = false) |-- date: date (nullable = false) dataOut.show(false) +---+---------+-----------------------+----------+ |id |stuff |ts |date | +---+---------+-----------------------+----------+ |1 |some data|2017-11-21 16:37:15.828|2017-11-21| |2 |more data|2017-11-21 16:37:15.828|2017-11-21| +---+---------+-----------------------+----------+
I would recommend using these methods for casting and plain SQL.
source share