Accessing a column in a framework using Spark

I am working on SPARK 1.6.1 using SCALA and am facing an unusual problem. When you create a new column using an existing column created during the same execution, you get "org.apache.spark.sql.AnalysisException".
WORKING: .

 val resultDataFrame = dataFrame.withColumn("FirstColumn",lit(2021)).withColumn("SecondColumn",when($"FirstColumn" - 2021 === 0, 1).otherwise(10))
    resultDataFrame.printSchema().

DOES NOT WORK

val resultDataFrame = dataFrame.withColumn("FirstColumn",lit(2021)).withColumn("SecondColumn",when($"FirstColumn" - **max($"FirstColumn")** === 0, 1).otherwise(10))
resultDataFrame.printSchema().

Here I create my SecondColumn using the FirstColumn created during the same execution. The question is why it does not work when using the avg / max functions. Please let me know how I can solve this problem.

+4
source share
1

"" , groupBy . . :

val result = df.groupBy($"col1").max("col2").as("max") // This works

DataFrame "col1", "max" .

val max = df.select(min("col2"), max("col2")) 

, . :

val result = df.filter($"col1" === max($"col2"))

.

, :

val maxDf = df.select(max("col2").as("maxValue"))
val joined = df.join(maxDf)
val result = joined.filter($"col1" === $"maxValue").drop("maxValue")

:

val maxValue = df.select(max("col2")).first.get(0)
val result = filter($"col1" === maxValue)
+2

Source: https://habr.com/ru/post/1651020/


All Articles