Hive

Here is the table:

+------+------+ | Name | Time | +------+------+ | A | 1 | | A | 2 | | A | 3 | | A | 4 | | B | 5 | | B | 6 | | A | 7 | | B | 8 | | B | 9 | | B | 10 | +------+------+ 

I want to write a request to get:

 +-------+--------+-----+ | Name | Start | End | +-------+--------+-----+ | A | 1 | 4 | | B | 5 | 6 | | A | 7 | 7 | | B | 8 | 10 | +-------+--------+-----+ 

Does anyone know how to do this?

+5
source share
1 answer

This is not the most efficient way, but it works.

 SELECT name, min(time) AS start,max(time) As end FROM ( SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY time) AS diff FROM foo ) t GROUP BY name,diff; 

I would suggest trying the following query and building GenericUDF to define spaces, much easier :)

 SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name; 
0
source

Source: https://habr.com/ru/post/1246292/


All Articles