GroupedData
cannot be used directly. The data is not physically grouped, and this is just a logical operation. You must apply some variant of the method agg
, for example:
events
.groupBy($"service_id", $"client_create_timestamp", $"client_id")
.min("client_send_timestamp")
or
events
.groupBy($"service_id", $"client_create_timestamp", $"client_id")
.agg(min($"client_send_timestamp"))
where client_send_timestamp
is the column you want to copy.
, , join
- . Spark DataFrame
Spark , , - . Spark SQL?
Spark 2.0 +
Dataset.groupByKey
, .