Hive

Question

Hive

Here is the table:

+------+------+ | Name | Time | +------+------+ | A | 1 | | A | 2 | | A | 3 | | A | 4 | | B | 5 | | B | 6 | | A | 7 | | B | 8 | | B | 9 | | B | 10 | +------+------+

I want to write a request to get:

 +-------+--------+-----+ | Name | Start | End | +-------+--------+-----+ | A | 1 | 4 | | B | 5 | 6 | | A | 7 | 7 | | B | 8 | 10 | +-------+--------+-----+

Does anyone know how to do this?

+5

sql group-by hive hiveql

Gogoogo Apr 2 '16 at 7:44

source share

1 answer

hlagos · Answer 1 · 2017-01-12T20:40:59+0000

This is not the most efficient way, but it works.

 SELECT name, min(time) AS start,max(time) As end FROM ( SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY time) AS diff FROM foo ) t GROUP BY name,diff;

I would suggest trying the following query and building GenericUDF to define spaces, much easier :)

 SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;

Hive

More articles: