Accessing a column in a framework using Spark

Question

Accessing a column in a framework using Spark

I am working on SPARK 1.6.1 using SCALA and am facing an unusual problem. When you create a new column using an existing column created during the same execution, you get "org.apache.spark.sql.AnalysisException".
WORKING: .

 val resultDataFrame = dataFrame.withColumn("FirstColumn",lit(2021)).withColumn("SecondColumn",when($"FirstColumn" - 2021 === 0, 1).otherwise(10))
    resultDataFrame.printSchema().

DOES NOT WORK

val resultDataFrame = dataFrame.withColumn("FirstColumn",lit(2021)).withColumn("SecondColumn",when($"FirstColumn" - **max($"FirstColumn")** === 0, 1).otherwise(10))
resultDataFrame.printSchema().

Here I create my SecondColumn using the FirstColumn created during the same execution. The question is why it does not work when using the avg / max functions. Please let me know how I can solve this problem.

+4

scala dataframe apache-spark apache-spark-sql spark-dataframe

Kazhiyur Aug 12 '16 at 7:42

source share

1

Daniel de Paula · Answer 1 · 2016-08-12T08:23:39+0000

"" , groupBy . . :

val result = df.groupBy($"col1").max("col2").as("max") // This works

DataFrame "col1", "max" .

val max = df.select(min("col2"), max("col2"))

, . :

val result = df.filter($"col1" === max($"col2"))

.

, :

val maxDf = df.select(max("col2").as("maxValue"))
val joined = df.join(maxDf)
val result = joined.filter($"col1" === $"maxValue").drop("maxValue")

:

val maxValue = df.select(max("col2")).first.get(0)
val result = filter($"col1" === maxValue)

Accessing a column in a framework using Spark

More articles: