Find every nth date in a continuous date stream

Question

Find every nth date in a continuous date stream

I would like to find / mark every 4th day in a continuous date stream inserted into my table for each user in a specific date range

CREATE TABLE mytable ( id INTEGER, myuser INTEGER, day DATE NOT NULL, PRIMARY KEY (id) );

the problem is that for each user only 3 continuous days are valid, after which there must be one “gap”

  id | myuser | day | -----+--------+------------+ 0 | 200 | 2012-01-12 | } 1 | 200 | 2012-01-13 | }--> 3 continuous days 2 | 200 | 2012-01-14 | } 3 | 200 | 2012-01-15 | <-- not ok, user 200 should get warned and delete this 4 | 200 | 2012-01-16 | } 5 | 200 | 2012-01-17 | }--> 3 continuous days 6 | 200 | 2012-01-18 | } 7 | 200 | 2012-01-19 | <-- not ok, user 200 should get warned and delete this 8 | 201 | 2012-01-12 | } 9 | 201 | 2012-01-13 | }--> 3 continuous days 10 | 201 | 2012-01-14 | } 11 | 201 | 2012-01-16 | <-- ok, there is a one day gap here 12 | 201 | 2012-01-17 |

The main goal is to look at a specific date range (usually a month) and identify days that are not allowed. Also, I have to make sure that matching dates are processed correctly, for example, if I look at the date range from 2012-02-01 to 2012-02-29, 2012-02-01 there may be a “break” if 2012-01- 29 - 2012-01-31 is present in this table for the same user.

+4

sql postgresql

return1.at Jan 12 '12 at 14:30

source share

2 answers

Much simpler and faster with the lag() window function :

 SELECT myuser ,day ,COALESCE(lag(day, 3) OVER (PARTITION BY myuser ORDER BY day) = (day - 3) ,FALSE) AS break_overdue FROM mytable WHERE day BETWEEN ('2012-01-12'::date - 3) AND '2012-01-16'::date;

Result:

  myuser | day | break_overdue --------+------------+--------------- 200 | 2012-01-12 | f 200 | 2012-01-13 | f 200 | 2012-01-14 | f 200 | 2012-01-15 | t 200 | 2012-01-16 | t 201 | 2012-01-12 | f 201 | 2012-01-13 | f 201 | 2012-01-14 | f 201 | 2012-01-16 | f

Basic moments:

After three consecutive days, the request marks all days as break_overdue . It is unclear whether you want them all to be marked after the rule has been violated, or only every 4th day.
I include 3 days before the start date (not just two) to determine if the first day already violates the rule.
Test simple: if the 3rd line before the current line in the section is equal to the current day - 3, then the rule was violated. I conclude all this in COALESCE to reset NULL to FALSE for cosmetic reasons only. Guaranteed to work as long as (myuser, day) unique .
^{In PostgreSQL, you can subtract integers from a date, effectively subtracting days.}
It can be run at the same query level , no CTE or subquery required. It should be much faster.
You will need PostgreSQL 8.4 or later for window functions .

+2

Erwin brandstetter Jan 13 '12 at 1:58

source share

MatBailie · Accepted Answer · 2012-01-12T14:55:53+0000

I do not have access to PostgreSQL, but hopefully this works ...

 WITH grouped_data AS ( SELECT ROW_NUMBER() OVER (PARTITION BY myuser ORDER BY day) - (day - start_date) AS user_group_id, myuser, day FROM myTable WHERE day >= start_date - 3 AND day <= end_date ) , sequenced_data AS ( SELECT ROW_NUMBER() OVER (PARTITION BY myuser, user_group_id ORDER BY day) AS sequence_id, myuser, day FROM grouped_data ) SELECT myuser, day, CASE WHEN sequence_id % 4 = 0 THEN 1 ELSE 0 END as should_be_a_break_day FROM sequenced_data WHERE day >= start_date

Sorry, I didn’t explain the work, I had to jump to the meeting :)

Example with start_date = '2012-01-14' ...

  id | myuser | day | ROW_NUMBER() | day - start_date | user_group_id ----+--------+------------+--------------+------------------+--------------- 0 | 200 | 2012-01-12 | 1 | -2 | 1 - -2 = 3 1 | 200 | 2012-01-13 | 2 | -1 | 2 - -1 = 3 2 | 200 | 2012-01-14 | 3 | 0 | 3 - 0 = 3 3 | 200 | 2012-01-15 | 4 | 1 | 4 - 1 = 3 4 | 200 | 2012-01-16 | 5 | 2 | 5 - 2 = 3 ----+--------+------------+--------------+------------------+--------------- 5 | 201 | 2012-01-12 | 1 | -2 | 1 - -2 = 3 6 | 201 | 2012-01-13 | 2 | -1 | 2 - -1 = 3 7 | 201 | 2012-01-14 | 3 | 0 | 3 - -1 = 3 8 | 201 | 2012-01-16 | 4 | 2 | 4 - 2 = 2

Any consecutive dates will have the same user_group_id. Each “space” in days reduces the value of user_group_id by 1 (see Line 8, if the record was for the 17th, a space of 2 days, the identifier would be 1).

Once you have group_id, row_number () can be easily used to tell which day in its sequence. A maximum of 3 days is the same as “Every 4th day should be a space,” and “x% 4 = 0” means every 4th day.

Find every nth date in a continuous date stream

Basic moments:

More articles: