How to calculate the amount and quantity in one group?

Question

How to calculate the amount and quantity in one group?

Based on the following DataFrame :

 val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt") +---+-----+----+ | ID|Categ|Amnt| +---+-----+----+ | 1| A| 10| | 2| A| 5| | 3| B| 56| +---+-----+----+

I would like to get the ID number and total amount by category:

 +-----+-----+---------+ |Categ|count|sum(Amnt)| +-----+-----+---------+ | B| 1| 56| | A| 2| 15| +-----+-----+---------+

Can I make an account and the amount without participation?

 client.groupBy("Categ").count .join(client.withColumnRenamed("Categ","cat") .groupBy("cat") .sum("Amnt"), 'Categ === 'cat) .drop("cat")

Maybe something like this:

 client.createOrReplaceTempView("client") spark.sql("SELECT Categ count(Categ) sum(Amnt) FROM client GROUP BY Categ").show()

+12

scala apache-spark apache-spark-sql

ulrich Nov 06 '16 at 12:05

source share

3 answers

You can perform aggregation as shown in the table below:

 client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show() +-----+---------+---------+ |Categ|sum(Amnt)|count(ID)| +-----+---------+---------+ | A| 15| 2| | B| 56| 1| +-----+---------+---------+

+8

Lalit lakhotia Apr 20 '17 at 7:00

source share

There are several ways to make aggregate functions in a spark,

 val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt")

one.

 val aggdf = client.groupBy('Categ).agg(Map("ID"->"count","Amnt"->"sum")) +-----+---------+---------+ |Categ|count(ID)|sum(Amnt)| +-----+---------+---------+ |B |1 |56 | |A |2 |15 | +-----+---------+---------+ //Rename and sort as needed. aggdf.sort('Categ).withColumnRenamed("count(ID)","Count").withColumnRenamed("sum(Amnt)","sum") +-----+-----+---+ |Categ|Count|sum| +-----+-----+---+ |A |2 |15 | |B |1 |56 | +-----+-----+---+

2.

 import org.apache.spark.sql.functions._ client.groupBy('Categ).agg(count("ID").as("count"),sum("Amnt").as("sum")) +-----+-----+---+ |Categ|count|sum| +-----+-----+---+ |B |1 |56 | |A |2 |15 | +-----+-----+---+

3.

 import com.google.common.collect.ImmutableMap; client.groupBy('Categ).agg(ImmutableMap.of("ID", "count", "Amnt", "sum")) +-----+---------+---------+ |Categ|count(ID)|sum(Amnt)| +-----+---------+---------+ |B |1 |56 | |A |2 |15 | +-----+---------+---------+ //Use column rename is required.

4. If you are an expert in SQL, you can do it too

 client.createOrReplaceTempView("df") val aggdf = spark.sql("select Categ, count(ID),sum(Amnt) from df group by Categ") aggdf.show() +-----+---------+---------+ |Categ|count(ID)|sum(Amnt)| +-----+---------+---------+ | B| 1| 56| | A| 2| 15| +-----+---------+---------+

0

Krishna Reddy Dec 21 '18 at 12:01

source share

Ram ghadiyaram · Accepted Answer · 2016-11-06T15:43:28+0000

I give a different example than yours

Several group functions are possible. try it accordingly

  // In 1.3.x, in order for the grouping column "department" to show up, // it must be included explicitly as part of the agg function call. df.groupBy("department").agg($"department", max("age"), sum("expense")) // In 1.4+, grouping column "department" is included automatically. df.groupBy("department").agg(max("age"), sum("expense"))

 import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions._ import spark.implicits._ val client: DataFrame = spark.sparkContext.parallelize( Seq((1,"A",10),(2,"A",5),(3,"B",56)) ).toDF("ID","Categ","Amnt") client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()

 +-----+---------+---------+ |Categ|sum(Amnt)|count(ID)| +-----+---------+---------+ | B| 56| 1| | A| 15| 2| +-----+---------+---------+

How to calculate the amount and quantity in one group?

More articles: