From a Spark DataFrame, I need to convert the / unix era timestamp column (e.g. 1509102527 = GMT: Friday, October 27, 2017 11:08:47) to a localized timestamp to get the local hour in a specific time zone.
Is there a Spark SQL function that can take a unix timestamp and return localized java.sql.Timestamp?
I already tried to use the function from_unixtime
, but it returns a localized timestamp based on the default system time zone on the computer on which the code is running. The only solution I have found so far is to convert this timestamp back to UTC and then from UTC to the target time zone. This is a unit test that works with a workaround, but there should be a better way to do this.
test("timezone localization should not change effective unix timestamp") {
import org.apache.spark.sql.functions._
val df = Seq(1509102527)
.toDF("unix_timestamp")
.withColumn("machine_localised_timestamp", from_unixtime('unix_timestamp))
.withColumn("utc_timestamp", to_utc_timestamp('machine_localised_timestamp, TimeZone.getDefault().getID()))
.withColumn("local_time", from_utc_timestamp('utc_timestamp, "Europe/Amsterdam"))
.withColumn("local_hour", hour('local_time))
.withColumn("reverted_unix_timestamp", unix_timestamp('local_time))
df.show(false)
val row = df.collect()(0)
row(row.fieldIndex("unix_timestamp")) shouldBe 1509102527
row(row.fieldIndex("reverted_unix_timestamp")) shouldBe 1509102527
row(row.fieldIndex("local_hour")) shouldBe 13
}
source
share