You can use the udf custom function to achieve what you want.
UDF Definition
object TupleUDFs { import org.apache.spark.sql.functions.udf // type tag is required, as we have a generic udf import scala.reflect.runtime.universe.{TypeTag, typeTag} def toTuple2[S: TypeTag, T: TypeTag] = udf[(S, T), S, T]((x: S, y: T) => (x, y)) }
Using
df.withColumn( "tuple_col", TupleUDFs.toTuple2[Int, Int].apply(df("a"), df("b")) )
Assuming that "a" and "b" are Int columns that you want to put in the tuple.
source share