The max () and sum () methods are undefined in the Java Spark Dataframe API (1.4.1)

Question

The max () and sum () methods are undefined in the Java Spark Dataframe API (1.4.1)

Putting sample code DataFrame.groupBy()into my code, but it showed methods max()and sum()undefined.

df.groupBy("department").agg(max("age"), sum("expense"));

Which Java package should I import if I want to use the max()and method sum()?

Is the syntax for this sample code correct?

+7

java apache-spark-sql spark-dataframe

Jingyu zhang Sep 08 '15 at 6:16

source share

4 answers

vishak · Answer 1 · 2015-09-09T12:29:04+0000

Import did not work for me. The Eclipse IDE still detected a compilation error.

But the next method call worked

df.groupBy("Gender").agg(org.apache.spark.sql.functions.max(df.col("Id")), org.apache.spark.sql.functions.sum(df.col("Income")));

If aggregation includes only one field, we can also use the following syntax

df.groupBy("Gender").max("Income");

Ganesh Krishnan · Answer 2 · 2016-02-16T00:01:02+0000

import static org.apache.spark.sql.functions.*

Try to import all functions, including maxandsum

Niemand · Answer 3 · 2015-09-08T06:35:46+0000

import org.apache.spark.sql.functions._

.

From what I noticed, you are using scala syntax while trying to access columns using the apply method. For Java, you need to pass the columns, for example, using the method .col:

df.groupBy("department").agg(max(df.col("age")), sum(df.col("expense")));

See Java example here

Aron_dc · Answer 4 · 2015-09-08T08:12:31+0000

It seems that you are looking for "org.apache.spark.sql.GroupedData"

To use them in your code, as you wrote it, you will need a static import.

Link to Api

Always try to read the API descriptions first.

The max () and sum () methods are undefined in the Java Spark Dataframe API (1.4.1)

More articles: