TL DR There is currently no good solution, and given the implementation of Spark SQL / Dataset , it is unlikely that there will be one in the foreseeable future.
You can use generic kryo or java encoder
val occupation: Seq[Occupation] = Seq(SoftwareEngineer, Wizard(1), Other("foo")) spark.createDataset(occupation)(org.apache.spark.sql.Encoders.kryo[Occupation])
but hardly useful in practice.
The UDT API provides another possible approach (Spark 1.6 , 2.0 , 2.1-SNAPSHOT ), it is private and requires quite a lot of code templates (you can check oasml.linalg.VectorUDT to see an implementation example).
source share