Spark scala remove columns containing only null values

Question

Spark scala remove columns containing only null values

Is there a way to remove columns of a spark info frame that contain only null values? (I use scala and Spark 1.6.2)

I am currently doing this:

var validCols: List[String] = List()
for (col <- df_filtered.columns){
  val count = df_filtered
    .select(col)
    .distinct
    .count
  println(col, count)
  if (count >= 2){
    validCols ++= List(col)
  }
}

to create a list of columns containing at least two different values, and then use it in select ().

Thank!

+4

null scala spark-dataframe

maxk Sep 11 '16 at 13:32

source share

1 answer

Timo St · Answer 1 · 2017-08-04T08:56:06+0000

I had the same problem and came up with a similar solution in Java. In my opinion, there is currently no other way to do this.

for (String column:df.columns())
    long count = df.select(column).distinct().count();

    if(count == 1 && df.select(column).first().isNullAt(0)){
        df = df.drop(column);
    }
}

, null. , , , , .

Spark scala remove columns containing only null values

More articles: