SQL Challenge / Puzzle: how to combine nested ranges?

  • This task is based on the actual practice of using IP ranges.
  • The solution I came up with is based on the stack trace problem I introduced earlier. Each range start is treated as a PUSH operation, and each end of range + 1 is treated as a POP operation.

Task

We have a range data set, where each range has a start point, end point and value.

create table ranges ( range_start int not null ,range_end int not null ,range_val char(1) not null ) ; 

A range may contain a different range or follow a different range, but may not be equal to another range or intersect with another range.

These are valid relationships between ranges:

 (1) (2) (3) (4) --------- --------- --------- -------- ----------- --- --- --- 

This relationship is not valid :

 (5) (6) ------- -------- ------- -------- 

Our initial ranges, if presented graphically, might look something like this (the letter represents range_val ):

 AAAAAAAA BBCCCCCCC DDE F GGGGG H IIII J 

The goal is to take the original range set and create a new set according to the following rule:

A held range will override the corresponding sub-range of the containing range.

The requested result, if presented graphically, might look something like this.

 ADDHAAAF BIIJIGCCC 

Requirements

  • The solution should be the only SQL query (subqueries in order).
  • Using T-SQL, PL / SQL, etc. not allowed .
  • The use of UDF (Custom Functions) is not permitted .

Sample data

 AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII JJ KKKLLL MM NN OOOOO P QQ insert into ranges (range_start,range_end,range_val) values (1 ,28 ,'A'); insert into ranges (range_start,range_end,range_val) values (31 ,34 ,'B'); insert into ranges (range_start,range_end,range_val) values (39 ,63 ,'C'); insert into ranges (range_start,range_end,range_val) values (1 ,3 ,'D'); insert into ranges (range_start,range_end,range_val) values (4 ,4 ,'E'); insert into ranges (range_start,range_end,range_val) values (7 ,14 ,'F'); insert into ranges (range_start,range_end,range_val) values (19 ,27 ,'G'); insert into ranges (range_start,range_end,range_val) values (43 ,50 ,'H'); insert into ranges (range_start,range_end,range_val) values (55 ,61 ,'I'); insert into ranges (range_start,range_end,range_val) values (1 ,2 ,'J'); insert into ranges (range_start,range_end,range_val) values (9 ,11 ,'K'); insert into ranges (range_start,range_end,range_val) values (12 ,14 ,'L'); insert into ranges (range_start,range_end,range_val) values (22 ,23 ,'M'); insert into ranges (range_start,range_end,range_val) values (25 ,26 ,'N'); insert into ranges (range_start,range_end,range_val) values (57 ,61 ,'O'); insert into ranges (range_start,range_end,range_val) values (13 ,13 ,'P'); insert into ranges (range_start,range_end,range_val) values (60 ,61 ,'Q'); 

Requested results

(Zeros are represented here as empty spaces)

 JJDEAAFFKKKLPLAAAAGGGMMGNNGA BBBB CCCCHHHHHHHHCCCCIIOOOQQCC range_start range_end range_val ----------- --------- --------- 1 2 J 3 3 D 4 4 E 5 6 A 7 8 F 9 11 K 12 12 L 13 13 P 14 14 L 15 18 A 19 21 G 22 23 M 24 24 G 25 26 N 27 27 G 28 28 A 29 30 31 34 B 35 38 39 42 C 43 50 H 51 54 C 55 56 I 57 59 O 60 61 Q 62 63 C 

Additional optional final line:

 64 
+5
source share
3 answers

Oracle Solution:

 with l as ( select level lvl from dual connect by level < 66 ), r as ( select range_start r1, range_end r2, range_val v, range_end - range_start + 1 cnt from ranges ), t1 as (select distinct lvl, nvl(max(v) keep (dense_rank first order by cnt) over (partition by lvl), '*' ) m from l left join r on lvl between r1 and r2 ), t2 as (select lvl, m, case when lag(m) over (order by lvl) <> m then 0 else 1 end mrk from t1), t3 as (select lvl, m, lvl - sum(mrk) over (order by lvl) grp from t2) select min(lvl) r1, max(lvl) r2, nullif(min(m), '*') val from t3 group by grp order by r1 

Conclusion is made upon request. My English is far from good, so it's hard to explain, but let it try:

  • l is a number generator,
  • r - data from ranges with calculated distance,
  • t1 - finds the value with the minimum distance for each lvl,
  • t2 - adds markers indicating whether the range begins,
  • t3 - adds a column, which we will use later for grouping data.
+1
source
  • The solution is based on the stack trace problem that I introduced earlier. Each range start is treated as a PUSH operation, and each end of range + 1 is treated as a POP operation.
  • You may notice that two internal analytic functions use the same window, therefore they are performed in one step.

Teradata h1>

 select new_range_start ,new_range_end ,last_value (range_val ignore nulls) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val from (select new_range_start ,range_val ,range_start ,range_end ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,min (new_range_start) over ( order by new_range_start ,range_start ,range_end desc rows between 1 following and 1 following ) - 1 as new_range_end from ( select range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges ) r (new_range_start,range_start,range_end,range_val) ) r qualify new_range_end >= new_range_start order by new_range_start ; 

Oracle

 select new_range_start ,new_range_end ,new_range_val from (select new_range_start ,new_range_end ,last_value (range_val ignore nulls) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val from (select new_range_start ,range_start ,range_end ,range_val ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,lead (new_range_start) over ( order by new_range_start, range_start ,range_end desc ) - 1 as new_range_end from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 ,range_start ,range_end ,cast (null as char(1)) from ranges ) r ) r ) r where new_range_end >= new_range_start order by new_range_start ; 

SQL Server / PostgreSQL / Hive

 select * from (select new_range_start ,new_range_end ,min (range_val) over ( partition by stack_depth,new_range_val_group_id ) as new_range_val from (select new_range_start ,new_range_end ,range_val ,stack_depth ,count (range_val) over ( partition by stack_depth order by new_range_start ,range_start ,range_end desc rows unbounded preceding ) as new_range_val_group_id from (select new_range_start ,range_start ,range_end ,range_val ,sum (case when range_val is null then -1 else 1 end) over ( order by new_range_start, range_start ,range_end desc rows unbounded preceding ) as stack_depth ,lead (new_range_start) over ( order by new_range_start, range_start ,range_end desc ) - 1 as new_range_end from ( select range_start as new_range_start ,range_start ,range_end ,range_val from ranges union all select range_end + 1 as new_range_start ,range_start ,range_end ,cast (null as char(1)) as range_val from ranges ) r ) r ) r ) r where new_range_end >= new_range_start order by new_range_start ; 
+1
source

Oracle 2 solution

  WITH borders AS /*get all borders of interval*/ (SELECT DISTINCT DECODE(is_end, 0, range_start, range_end) AS border ,is_end FROM ranges r, (SELECT 0 AS is_end FROM dual UNION ALL SELECT 1 AS is_end FROM dual)), interv AS /*get all intervals*/ (SELECT border + is_end AS beg_int ,lead(border) over(ORDER BY border, is_end ) - lead(DECODE(is_end, 0, 1, 0)) over(ORDER BY border, is_end) AS end_int FROM borders ORDER BY 1) SELECT i.beg_int ,i.end_int ,(SELECT MAX(r.range_val) keep (dense_rank FIRST ORDER BY r.range_end - r.range_start) FROM ranges r WHERE i.beg_int >= r.range_start AND i.end_int <= r.range_end) AS range_val FROM interv i WHERE beg_int <= end_int OR end_int IS NULL ORDER BY i.beg_int; 

Add solution without self-connect: EDIT: fixed defect.

  WITH intervals AS (SELECT DECODE(is_end, -1, range_val, NULL) AS range_val ,DECODE(is_end, -1, range_start, range_end) AS border ,is_end ,- (SUM(is_end) over(ORDER BY DECODE(is_end, -1, range_start, range_end), is_end, (range_end - range_start) * is_end)) AS poss ,(range_end - range_start) * is_end AS ord2 FROM ranges r ,(SELECT -1 AS is_end FROM dual UNION ALL SELECT 1 AS is_end FROM dual)), range_stack AS (SELECT border + DECODE(is_end, 1, 1, 0) AS begin_int ,lead(border) over(ORDER BY border, is_end, ord2) + DECODE(lead(is_end) over(ORDER BY border, is_end, ord2), 1, 0, -1) AS end_int ,last_value(range_val ignore NULLS) over(PARTITION BY poss ORDER BY border, is_end, ord2) AS range_val FROM intervals) SELECT begin_int ,end_int ,range_val FROM range_stack WHERE end_int >= begin_int OR end_int IS NULL; 
0
source

Source: https://habr.com/ru/post/1257906/


All Articles