Removing columns by data type in Scala Spark

Question

Removing columns by data type in Scala Spark

df1.printSchema() displays the column names and the type of data they possess.

df1.drop($"colNames") will decrease by column name.

Is there a way to adapt this command to delete by data type?

+4

scala apache-spark

Adurthi ashwin swarup Jan 29 '17 at 7:59

source share

2 answers

Here is an example in scala:

var categoricalFeatColNames = df.schema.fields filter { _.dataType.isInstanceOf[org.apache.spark.sql.types.StringType] } map { _.name }

+2

Soroosh Sep 04 '17 at 5:56

source share

rogue-one · Accepted Answer · 2017-01-29T09:26:51+0000

if you want to drop specific columns in the dataframe based on its types .. then the below snippet will help. In this example, I have a dataframe with two fields of type String and Int. And I delete the String string (all fields of type String will be deleted) from the schema.

import sqlContext.implicits._

val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")

val fields = df.schema.fields filter {
x => x.dataType match { 
      case x: org.apache.spark.sql.types.StringType => true
      case _ => false 
      } 
    } map { x => x.name }

val newDf = fields.foldLeft(df){ case(dframe,field) => dframe.drop(field) }

NewDf schema org.apache.spark.sql.DataFrame = [c2: int]

Removing columns by data type in Scala Spark

More articles: