How to display KeyValueGroupedDataset in Spark?

Question

How to display KeyValueGroupedDataset in Spark?

I am trying to learn datasets in Spark. The only thing I can’t understand is to show KeyValueGroupedDatasetbecause it showdoesn’t work for him. Also, what is equivalent mapto KeyValuGroupedDataSet? I would appreciate it if someone gives some examples.

+4

scala dataset apache-spark rdd

pythonic May 11 '17 at 14:48

source share

1 answer

pythonic · Accepted Answer · 2017-05-11T15:15:24+0000

OK, I got this idea from the above examples here and here . I gave below a simple example that I wrote.

val x = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS
x: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

val g = x.groupByKey(_._1)
g: org.apache.spark.sql.KeyValueGroupedDataset[String,(String, Int)] = ...

val z = g.mapGroups{case(k, iter) => (k, iter.map(x => x._2).toArray)}
z: org.apache.spark.sql.Dataset[(String, Array[Int])] = [_1: string, _2: array<int>]

z.show
+---+--------+
| _1|      _2|
+---+--------+
|  c|[40, 39]|
|  b|    [33]|
|  a|[36, 38]|
+---+--------+

How to display KeyValueGroupedDataset in Spark?

More articles: