I am surprised that none of the answers indicated that Spark SQL comes with several standard features that meet the requirements:
For example, I have a dataframe table with 10 functions, and I have a row with 8 zero value, and then I want to delete it.
You can use one of the variants of the DataFrameNaFunctions.drop method with minNonNulls
accordingly, say 2.
drop (minNonNulls: Int, cols: Seq [String]): DataFrame Returns a new DataFrame that places rows containing non-zero and non-NaN values ββless than minNonNulls in the specified columns.
And to satisfy the variability of column names, as in the requirement:
I cannot write column names and do something accordingly.
You can simply use Dataset.columns :
: Array [String] Returns all column names as an array.
Let's say you have the following dataset with 5 functions (columns) and several rows, almost all null
s.
val ns: String = null val features = Seq(("0","1","2",ns,ns), (ns, ns, ns, ns, ns), (ns, "1", ns, "2", ns)).toDF scala> features.show +----+----+----+----+----+ | _1| _2| _3| _4| _5| +----+----+----+----+----+ | 0| 1| 2|null|null| |null|null|null|null|null| |null| 1|null| 2|null| +----+----+----+----+----+ // drop rows with more than (5 columns - 2) = 3 nulls scala> features.na.drop(2, features.columns).show +----+---+----+----+----+ | _1| _2| _3| _4| _5| +----+---+----+----+----+ | 0| 1| 2|null|null| |null| 1|null| 2|null| +----+---+----+----+----+