Redshift query to combine the result if the data is continuous in the table

Question

Redshift query to combine the result if the data is continuous in the table

I have a requirement in redshift where I need to combine the result if the data is continuous. I have the following table where user_id, product_id is varchar and login_time, log_out_time is the timestamp.

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 2:00:00 AM
ashok      facebook     1/1/2017 2:00:00 AM       1/1/2017 3:00:00 AM
ashok      facebook     1/1/2017 3:00:00 AM       1/1/2017 4:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 6:00:00 AM
ashok      linked_in    1/1/2017 6:00:00 AM       1/1/2017 7:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM
ashok      linked_in    1/1/2017 7:00:00 AM       1/1/2017 8:00:00 AM

I need to combine the result if the data is continuous for a given user_id for each product. So my conclusion should look like

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 4:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 8:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM

I tried the following query, but it didn’t help me,

SELECT user_id, product_id, MIN(login_time), MAX(log_out_time) FROM TABLE_NAME GROUP BY user_id, product_id

The above query does not give my desired result, since it does not have logic for checking data in continuous mode. I need a request for this without using any custom function, but I am allowed to use any built-in redshift function.

+4

sql database amazon-redshift gaps-and-islands

ashokramcse 04 . '17 18:55

1

Gordon Linoff · Accepted Answer · 2017-07-04T20:51:18+0000

lag(), , , , group by :

select user_id, product_id, min(login_time), max(log_out_time)
from (select t.*,
             sum(case when prev_lt = login_time then 0 else 1 end) over
                 (partition by user_id, product_id
                  order by login_time
                  rows between unbounded preceding and current row
                 ) as grp
      from (select t.*,
                   lag(log_out_time) over (partition by user_id, product_id order by login_time) as prev_lt
            from t
           ) t
     ) t
group by user_id, product_id, grp;

Redshift query to combine the result if the data is continuous in the table

More articles: