Use size .
size (e: Column): Column Returns the length of an array or map.
The following example is presented in Scala, and I leave it to you to convert it to Java, but the general idea is exactly the same, regardless of the programming language.
val input = spark.range(4)
.withColumn("COL1", $"id" % 2)
.select($"COL1", $"id" as "COL2")
scala> input.show
+----+----+
|COL1|COL2|
+----+----+
| 0| 0|
| 1| 1|
| 0| 2|
| 1| 3|
+----+----+
val s = input
.groupBy("COL1")
.agg(
concat_ws(",", collect_list("COL2")) as "concat",
size(collect_list("COL2")) as "size")
scala> s.show
+----+------+----+
|COL1|concat|size|
+----+------+----+
| 0| 0,2| 2|
| 1| 1,3| 2|
+----+------+----+
In Java, this will be next. Thanks to Krishna Prasad for sharing the code with the SO / Spark community!
Dataset<Row> ds = df.groupBy("COL1").agg(
org.apache.spark.sql.functions.concat_ws(",",org.apache.spark.sql.functions.collect_list("COL2")).as("sample"),
org.apache.spark.sql.functions.size(org.apache.spark.sql.functions.collect_list("COL2")).as("size"));
source
share