Apache Spark Dataframe Groupby agg () for multiple columns

I have a DataFrame with 3 columns i.e. Id, First Name, Last Name

I want to apply GroupBy based on Id and want to collect the First Name, Last Name column as a list.

Example: - I have a DF like this

 +---+-------+--------+ |id |fName |lName | +---+-------+--------+ |1 |Akash |Sethi | |2 |Kunal |Kapoor | |3 |Rishabh|Verma | |2 |Sonu |Mehrotra| +---+-------+--------+ 

and I want my conclusion to look like this

 +---+-------+--------+--------------------+ |id |fname |lName | +---+-------+--------+--------------------+ |1 |[Akash] |[Sethi] | |2 |[Kunal, Sonu] |[Kapoor, Mehrotra] | |3 |[Rishabh] |[Verma] | +---+-------+--------+--------------------+ 

Thanks at Advance

+5
source share
1 answer

You can fill in several columns as follows:

 df.groupBy("id").agg(collect_list("fName"), collect_list("lName")) 

This will give you the expected result.

+3
source

Source: https://habr.com/ru/post/1265591/


All Articles