SQL smoothing date intervals

I have a database table in which there are three columns that are needed for this question:

  • Group ID that groups rows together
  • the date of the beginning
  • Final date

I want to make a presentation from this table so that overlapping date intervals that have the same grouping identifier are smoothed out.

Time intervals that do not overlap should not be flattened.

Example:

Group ID       Start         End
   1        2016-01-01   2017-12-31
   1        2016-06-01   2020-01-01
   1        2022-08-31   2030-12-31
   2        2010-03-01   2017-01-01
   2        2012-01-01   2013-12-31
   3        2001-01-01   9999-13-31

... becomes ...

Group ID       Start         End
   1        2016-01-01   2020-01-01
   1        2022-08-31   2030-12-31
   2        2010-03-01   2017-01-01
   3        2001-01-01   9999-12-31

Intervals that overlap can do this in some way, are completely closed by other intervals, or can be staggered or have the same start and / or end dates.

. ( > 95%) . , ; , ; .

, , .

SQL, , , ?

, . , , SQL, .

+4
4

, :

select * 
from dateclap d1
where not exists(
    select * 
    from dateclap d2 
    where d2.group_id=d1.group_id and 
        d2.end_date >= d1.start_date and 
        (d2.start_date < d1.start_date or 
        (d1.start_date=d2.start_date and d2.r_id<d1.r_id)))

, /, (r_id).

connect_by_root r_id, . , , min/max (connect_by_root r_id - ):

select group_id, min(start_date) as start_date, max(end_date) as end_date
from dateclap d1
start with not exists(
    select * 
    from dateclap d2 
    where d2.group_id=d1.group_id and 
        d2.end_date >= d1.start_date and 
        (d2.start_date < d1.start_date or 
        (d1.start_date=d2.start_date and d2.r_id<d1.r_id)))
connect by nocycle
    prior group_id=group_id and 
    start_date between prior start_date and prior end_date
group by group_id, connect_by_root r_id

nocycle - , , . "connect by", "", .

P.S. :

CREATE TABLE "ANIKIN"."DATECLAP" 
(   
    "R_ID" NUMBER, 
    "GROUP_ID" NUMBER, 
    "START_DATE" DATE, 
    "END_DATE" DATE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "ANIKIN" ;

(, , ) r_id seuqence/ - , r_id .

+1

2 , :

CREATE OR REPLACE FUNCTION getMinStartDate
(
  p_group_id IN NUMBER,
  p_start    IN DATE
)
RETURN DATE AS
  v_result DATE;
BEGIN
  SELECT MIN(start_date)
    INTO v_result
    FROM my_data
   WHERE group_id = p_group_id
     AND start_date <= p_start
     AND end_date >= p_start;
  RETURN v_result;
END getMinStartDate;

CREATE OR REPLACE FUNCTION getMaxEndDate
(
  p_group_id IN NUMBER,
  p_end      IN DATE
)
RETURN DATE AS
  v_result DATE;
BEGIN
  SELECT MAX(end_date)
    INTO v_result
    FROM my_data
   WHERE group_id = p_group_id
     AND start_date <= p_end
     AND end_date >= p_end;
  RETURN v_result;
END getMaxEndDate;

.
, DISTINCT, :

SELECT DISTINCT
       group_id,
       getMinStartDate(group_id, start_date) AS start_date,
       getMaxEndDate(group_id, end_date) AS end_date
FROM   my_data;
+1
   select t1.group_id, least(min(t1.start_date),  min(t2.start_date)),  greatest(max(t1.start_date), max(t2.end_date)) from test_interval t1, test_interval t2 
   where (t1.start_date, t1.end_date) overlaps (t2.start_date, t2.end_date) 
      and t1.rowid <> t2.rowid 
      and t1.group_id = t2.group_id group by t1.group_id;

. OVERLAPS - . , , , , . rowid,

+1

9999-13-31 . .

, 9999-12-31. - ; 9999-12-31, . 8999-12-31; . {:-) , , . ( 9999-12-31, .)

, . , 2016 " " 2017-01-01 ( ), 2017 " " 2017-01-01. , , - . , 2016-08-31 , 2016-09-01, ; ( , 2016-08-31 ).

The OP did not indicate how to interpret end dates here. I assume that they are described in the last paragraph; otherwise, the solution can be easily adapted (but for this you will need to first add from 1 to the end dates, and then subtract 1 at the end - this is exactly one of those cases when 9999-12-31 is not a good placeholder for the "unknown". )

Decision

with m as
        (
         select group_id, start_date,
                   max(end_date) over (partition by group_id order by start_date 
                             rows between unbounded preceding and 1 preceding) as m_time
         from inputs   -- "inputs" is the name of the base table
         union all
         select group_id, NULL, max(end_date) from inputs group by group_id
        ),
     n as
        (
         select group_id, start_date, m_time 
         from m 
         where start_date > m_time or start_date is null or m_time is null
        ),
     f as
        (
         select group_id, start_date,
            lead(m_time) over (partition by group_id order by start_date) as end_date
         from n
        )
select * from f where start_date is not null
;

Output (with data provided):

  GROUP_ID START_DATE END_DATE 
---------- ---------- ----------
         1 2016-01-01 2020-01-01
         1 2022-08-31 2030-12-31
         2 2010-03-01 2017-01-01
         3 2001-01-01 8999-12-31
0
source

Source: https://habr.com/ru/post/1657369/


All Articles