Get min and max from a specific column in a scala spark data block

I would like to access the min and max of a specific column from my data framework, but I don't have a column header, just its number, so should I use scala?

maybe something like this:

val q = nextInt(ncol) //we pick a random value for a column number col = df(q) val minimum = col.min() 

Sorry if this sounds like a dumb question, but I could not find any info on SO on this: /

+5
source share
4 answers

How to get column name from metadata:

 val selectedColumnName = df.columns(q) //pull the (q + 1)th column from the columns array df.agg(min(selectedColumnName), max(selectedColumnName)) 
+7
source

You can use pattern matching when assigning a variable:

 import org.apache.spark.sql.functions.{min, max} import org.apache.spark.sql.Row val Row(minValue: Double, maxValue: Double) = df.agg(min(q), max(q)).head 

Where q is either Column or column name (String). Assuming your data type is Double .

+9
source

You can use the column number to extract the first column names (by indexing df.columns ), and then use the column names to combine:

 val df = Seq((2.0, 2.1), (1.2, 1.4)).toDF("A", "B") // df: org.apache.spark.sql.DataFrame = [A: double, B: double] df.agg(max(df(df.columns(1))), min(df(df.columns(1)))).show +------+------+ |max(B)|min(B)| +------+------+ | 2.1| 1.4| +------+------+ 
+5
source

Here is a direct way to get min and max from a data frame with column names:

 val df = Seq((1, 2), (3, 4), (5, 6)).toDF("A", "B") df.show() /* +---+---+ | A| B| +---+---+ | 1| 2| | 3| 4| | 5| 6| +---+---+ */ df.agg(min("A"), max("A")).show() /* +------+------+ |min(A)|max(A)| +------+------+ | 1| 5| +------+------+ */ 

If you want to get the min and max values โ€‹โ€‹as separate variables, you can convert the result of agg() above to Row and use Row.getInt(index) to get the values โ€‹โ€‹of the Row column.

 val min_max = df.agg(min("A"), max("A")).head() // min_max: org.apache.spark.sql.Row = [1,5] val col_min = min_max.getInt(0) // col_min: Int = 1 val col_max = min_max.getInt(1) // col_max: Int = 5 
0
source

Source: https://habr.com/ru/post/1266361/


All Articles