Spark scala remove columns containing only null values

Is there a way to remove columns of a spark info frame that contain only null values? (I use scala and Spark 1.6.2)

I am currently doing this:

var validCols: List[String] = List()
for (col <- df_filtered.columns){
  val count = df_filtered
    .select(col)
    .distinct
    .count
  println(col, count)
  if (count >= 2){
    validCols ++= List(col)
  }
}

to create a list of columns containing at least two different values, and then use it in select ().

Thank!

+4
source share
1 answer

I had the same problem and came up with a similar solution in Java. In my opinion, there is currently no other way to do this.

for (String column:df.columns())
    long count = df.select(column).distinct().count();

    if(count == 1 && df.select(column).first().isNullAt(0)){
        df = df.drop(column);
    }
}

, null. , , , , .

+1

Source: https://habr.com/ru/post/1654271/


All Articles