We register the main flows of actions that our users do on our iPad application on the table. Each thread has a beginning (marked βStartβ) and an end that is either marked βCanceledβ or βDoneβ and there should be no overlapping events.
The set of threads started, canceled, or terminated for the user is as follows:
user_id timestamp event_text event_num info@cafe-test.de 2016-10-30 00:08:00.966+00 Flow Started 0 info@cafe-test.de 2016-10-30 00:08:15.58+00 Flow Cancelled 2 info@cafe-test.de 2016-10-30 00:08:15.581+00 Flow Started 0 info@cafe-test.de 2016-10-30 00:34:44.134+00 Flow Finished 1 info@cafe-test.de 2016-10-30 00:42:26.102+00 Flow Started 0 info@cafe-test.de 2016-10-30 00:42:49.276+00 Flow Cancelled 2 info@cafe-test.de 2016-10-30 00:42:49.277+00 Flow Started 0 info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Cancelled 2 info@cafe-test.de 2016-10-30 00:59:47.337+00 Flow Started 0 info@cafe-test.de 2016-10-30 00:59:47.928+00 Flow Cancelled 2
We want to calculate how long the flow is canceled and the flow ends on average. To do this, we need to bind the Started with Canceled or Finished event. The following code does this, but cannot get around the following data quality problem that we have:
When a client wants to start a new stream (call him Flow2), before ending the current stream (Flow1), we remove the canceled event when we remove the started event for a new stream. So, Flow1 Cancelled=Flow2 Started . However, when we use window functions to order and control / delay between ordered events that actually belong to different threads, are mapped. Using this code:
WITH track_scf AS (SELECT user_id, timestamp, event_text, CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2, 4 desc ) SELECT user_id, CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as end, CASE WHEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(event_num, 1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) END as action FROM track_scf
We get this result:
user_id start end action info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2 info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1 info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2 info@cafe-test.de 2016-10-30 00:42:49.277+00 NULL NULL info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.337+00 2 info@cafe-test.de NULL 2016-10-30 00:59:47.928+00 2
But we should get the following:
user_id start end action info@cafe-test.de 2016-10-30 00:08:00.966+00 2016-10-30 00:08:15.58+00 2 info@cafe-test.de 2016-10-30 00:08:15.581+00 2016-10-30 00:34:44.134+00 1 info@cafe-test.de 2016-10-30 00:42:26.102+00 2016-10-30 00:42:49.276+00 2 info@cafe-test.de 2016-10-30 00:42:49.277+00 2016-10-30 00:59:47.337+00 2 info@cafe-test.de 2016-10-30 00:59:47.337+00 2016-10-30 00:59:47.928+00 2
How do I change the code so that the pairing is correct?
mls.z source share