Oracle GROUP BY similar timestamps?

I have an action table with a structure like this:

id prd_id act_dt grp ------------------------------------ 1 1 2000-01-01 00:00:00 2 1 2000-01-01 00:00:01 3 1 2000-01-01 00:00:02 4 2 2000-01-01 00:00:00 5 2 2000-01-01 00:00:01 6 2 2000-01-01 01:00:00 7 2 2000-01-01 01:00:01 8 3 2000-01-01 00:00:00 9 3 2000-01-01 00:00:01 10 3 2000-01-01 02:00:00 

I want to break the data in this action table into a product ( prd_id ) and an activity date ( act_dt ) and update the group column ( grp ) with the value from the sequence for each of these groups.

Kicker, I need to group by the same timestamps, where the same means that "all records have a difference of exactly 1 second." In other words, within a group, the difference between any 2 records when sorted by date will be exactly 1 second, and the difference between the first and last records can be any amount of time if all the intermediate records are 1 second apart.

For example data, these groups will be:

 id prd_id act_dt grp ------------------------------------ 1 1 2000-01-01 00:00:00 1 2 1 2000-01-01 00:00:01 1 3 1 2000-01-01 00:00:02 1 4 2 2000-01-01 00:00:00 2 5 2 2000-01-01 00:00:01 2 6 2 2000-01-01 01:00:00 3 7 2 2000-01-01 01:00:01 3 8 3 2000-01-01 00:00:00 4 9 3 2000-01-01 00:00:01 4 10 3 2000-01-01 02:00:00 5 

What method would I use for this?

The size of the table is ~ 20 million rows, if this affects the method used to solve the problem.

+6
source share
1 answer

I am not an Oracle wiz, so I assume the best option for a single line:

  (CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id, 

It just needs to be "the number of seconds from [aDateConstant] to act_dt". The result may be negative. It just needs to be the number of seconds to turn your act_dt into INT. The rest should work fine.

 WITH sequenced_data AS ( SELECT ROW_NUMBER() OVER (PARTITION BY prd_id ORDER BY act_dt) AS sequence_id, (CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id, * FROM yourTable ) SELECT DENSE_RANK() OVER (PARTITION BY prd_id ORDER BY time_id - sequence_id) AS group_id, * FROM sequenced_data 

Sample data:

  sequence_id | time_id | ts | group_id -------------+---------+-----+---------- 1 | 1 | 0 | 1 2 | 2 | 0 | 1 3 | 3 | 0 | 1 4 | 8 | 4 | 2 5 | 9 | 4 | 2 6 | 12 | 6 | 3 7 | 14 | 7 | 4 8 | 15 | 7 | 4 


NOTE. . This assumes that at the same time there are not several entries. If they are, they must first be filtered. Perhaps just using GROUP BY in the previous CTE.

+2
source

Source: https://habr.com/ru/post/912245/


All Articles