it's simple, just use a dot to select nested structures, for example. $"foo.baz" :
case class Foo(bar:String,baz:String) case class Record(foo:Foo) val df = Seq( Record(Foo("Hi","There")) ).toDF() df.printSchema root |-- foo: struct (nullable = true) | |-- bar: string (nullable = true) | |-- baz: string (nullable = true) val myUDF = udf((s:String) => {
If you want to add the result of your UDF to the existing foo structure, then get:
root |-- foo: struct (nullable = false) | |-- bar: string (nullable = true) | |-- baz: string (nullable = true) | |-- udfResult: string (nullable = true)
There are two options:
with withColumn :
df .withColumn("udfResult",myUDF($"foo.baz")) .withColumn("foo",struct($"foo.*",$"udfResult")) .drop($"udfResult")
with select :
df .select(struct($"foo.*",myUDF($"foo.baz").as("udfResult")).as("foo"))
EDIT: Replacing an existing attribute in a structure with a UDF result: unfortunately, this does not work:
df .withColumn("foo.baz",myUDF($"foo.baz"))
but it can be done as follows:
// get all columns except foo.baz val structCols = df.select($"foo.*") .columns .filter(_!="baz") .map(name => col("foo."+name)) df.withColumn( "foo", struct((structCols:+myUDF($"foo.baz").as("baz")):_*) )
source share