Find every nth date in a continuous date stream

I would like to find / mark every 4th day in a continuous date stream inserted into my table for each user in a specific date range

CREATE TABLE mytable ( id INTEGER, myuser INTEGER, day DATE NOT NULL, PRIMARY KEY (id) ); 

the problem is that for each user only 3 continuous days are valid, after which there must be one โ€œgapโ€

  id | myuser | day | -----+--------+------------+ 0 | 200 | 2012-01-12 | } 1 | 200 | 2012-01-13 | }--> 3 continuous days 2 | 200 | 2012-01-14 | } 3 | 200 | 2012-01-15 | <-- not ok, user 200 should get warned and delete this 4 | 200 | 2012-01-16 | } 5 | 200 | 2012-01-17 | }--> 3 continuous days 6 | 200 | 2012-01-18 | } 7 | 200 | 2012-01-19 | <-- not ok, user 200 should get warned and delete this 8 | 201 | 2012-01-12 | } 9 | 201 | 2012-01-13 | }--> 3 continuous days 10 | 201 | 2012-01-14 | } 11 | 201 | 2012-01-16 | <-- ok, there is a one day gap here 12 | 201 | 2012-01-17 | 

The main goal is to look at a specific date range (usually a month) and identify days that are not allowed. Also, I have to make sure that matching dates are processed correctly, for example, if I look at the date range from 2012-02-01 to 2012-02-29, 2012-02-01 there may be a โ€œbreakโ€ if 2012-01- 29 - 2012-01-31 is present in this table for the same user.

+4
source share
2 answers

I do not have access to PostgreSQL, but hopefully this works ...

 WITH grouped_data AS ( SELECT ROW_NUMBER() OVER (PARTITION BY myuser ORDER BY day) - (day - start_date) AS user_group_id, myuser, day FROM myTable WHERE day >= start_date - 3 AND day <= end_date ) , sequenced_data AS ( SELECT ROW_NUMBER() OVER (PARTITION BY myuser, user_group_id ORDER BY day) AS sequence_id, myuser, day FROM grouped_data ) SELECT myuser, day, CASE WHEN sequence_id % 4 = 0 THEN 1 ELSE 0 END as should_be_a_break_day FROM sequenced_data WHERE day >= start_date 


Sorry, I didnโ€™t explain the work, I had to jump to the meeting :)

Example with start_date = '2012-01-14' ...

  id | myuser | day | ROW_NUMBER() | day - start_date | user_group_id ----+--------+------------+--------------+------------------+--------------- 0 | 200 | 2012-01-12 | 1 | -2 | 1 - -2 = 3 1 | 200 | 2012-01-13 | 2 | -1 | 2 - -1 = 3 2 | 200 | 2012-01-14 | 3 | 0 | 3 - 0 = 3 3 | 200 | 2012-01-15 | 4 | 1 | 4 - 1 = 3 4 | 200 | 2012-01-16 | 5 | 2 | 5 - 2 = 3 ----+--------+------------+--------------+------------------+--------------- 5 | 201 | 2012-01-12 | 1 | -2 | 1 - -2 = 3 6 | 201 | 2012-01-13 | 2 | -1 | 2 - -1 = 3 7 | 201 | 2012-01-14 | 3 | 0 | 3 - -1 = 3 8 | 201 | 2012-01-16 | 4 | 2 | 4 - 2 = 2 

Any consecutive dates will have the same user_group_id. Each โ€œspaceโ€ in days reduces the value of user_group_id by 1 (see Line 8, if the record was for the 17th, a space of 2 days, the identifier would be 1).

Once you have group_id, row_number () can be easily used to tell which day in its sequence. A maximum of 3 days is the same as โ€œEvery 4th day should be a space,โ€ and โ€œx% 4 = 0โ€ means every 4th day.

+5
source

Much simpler and faster with the lag() window function :

 SELECT myuser ,day ,COALESCE(lag(day, 3) OVER (PARTITION BY myuser ORDER BY day) = (day - 3) ,FALSE) AS break_overdue FROM mytable WHERE day BETWEEN ('2012-01-12'::date - 3) AND '2012-01-16'::date; 

Result:

  myuser | day | break_overdue --------+------------+--------------- 200 | 2012-01-12 | f 200 | 2012-01-13 | f 200 | 2012-01-14 | f 200 | 2012-01-15 | t 200 | 2012-01-16 | t 201 | 2012-01-12 | f 201 | 2012-01-13 | f 201 | 2012-01-14 | f 201 | 2012-01-16 | f 

Basic moments:

  • After three consecutive days, the request marks all days as break_overdue . It is unclear whether you want them all to be marked after the rule has been violated, or only every 4th day.

  • I include 3 days before the start date (not just two) to determine if the first day already violates the rule.

  • Test simple: if the 3rd line before the current line in the section is equal to the current day - 3, then the rule was violated. I conclude all this in COALESCE to reset NULL to FALSE for cosmetic reasons only. Guaranteed to work as long as (myuser, day) unique .
    In PostgreSQL, you can subtract integers from a date, effectively subtracting days.

  • It can be run at the same query level , no CTE or subquery required. It should be much faster.

  • You will need PostgreSQL 8.4 or later for window functions .

+2
source

Source: https://habr.com/ru/post/1390649/


All Articles