The max () and sum () methods are undefined in the Java Spark Dataframe API (1.4.1)

Putting sample code DataFrame.groupBy()into my code, but it showed methods max()and sum()undefined.

df.groupBy("department").agg(max("age"), sum("expense"));

Which Java package should I import if I want to use the max()and method sum()?

Is the syntax for this sample code correct?

+7
source share
4 answers

Import did not work for me. The Eclipse IDE still detected a compilation error.

But the next method call worked

df.groupBy("Gender").agg(org.apache.spark.sql.functions.max(df.col("Id")), org.apache.spark.sql.functions.sum(df.col("Income")));

If aggregation includes only one field, we can also use the following syntax

df.groupBy("Gender").max("Income");
+9
source
import static org.apache.spark.sql.functions.* 

Try to import all functions, including maxandsum

+6

import org.apache.spark.sql.functions._

.

From what I noticed, you are using scala syntax while trying to access columns using the apply method. For Java, you need to pass the columns, for example, using the method .col:

df.groupBy("department").agg(max(df.col("age")), sum(df.col("expense")));

See Java example here

+3
source

It seems that you are looking for "org.apache.spark.sql.GroupedData"

To use them in your code, as you wrote it, you will need a static import.

Link to Api

Always try to read the API descriptions first.

0
source

Source: https://habr.com/ru/post/1606350/


All Articles