if you want to drop specific columns in the dataframe based on its types .. then the below snippet will help. In this example, I have a dataframe with two fields of type String and Int. And I delete the String string (all fields of type String will be deleted) from the schema.
import sqlContext.implicits._
val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")
val fields = df.schema.fields filter {
x => x.dataType match {
case x: org.apache.spark.sql.types.StringType => true
case _ => false
}
} map { x => x.name }
val newDf = fields.foldLeft(df){ case(dframe,field) => dframe.drop(field) }
NewDf schema org.apache.spark.sql.DataFrame = [c2: int]
source
share