Select records to compare range.

I suck this one. I would like to do this in pure sql, but at this point any decision will do.

I have tables ta and tb containing lists of events that occurred at about the same time. The goal is to find "orphan" entries from ta to tb . For instance:.

 create table ta ( dt date, id varchar(1)); insert into ta values( to_date('20130101 13:01:01', 'yyyymmdd hh24:mi:ss') , '1' ); insert into ta values( to_date('20130101 13:01:02', 'yyyymmdd hh24:mi:ss') , '2' ); insert into ta values( to_date('20130101 13:01:03', 'yyyymmdd hh24:mi:ss') , '3' ); create table tb ( dt date, id varchar(1)); insert into tb values( to_date('20130101 13:01:5', 'yyyymmdd hh24:mi:ss') , 'a' ); insert into tb values( to_date('20130101 13:01:6', 'yyyymmdd hh24:mi:ss') , 'b' ); 

But let's say I have to use a threshold of + -5 seconds. Thus, the search query will look something like this:

  select ta.id ida, tb.id idb from ta, tb where tb.dt between (ta.dt - 5/86400) and (ta.dt + 5/86400) order by 1,2 

(script: http://sqlfiddle.com/#!4/b58f7c/5 )

Rules:

  • Events are displayed 1 to 1
  • The nearest event in tb for the given in ta will be considered the correct mapping.

However, the resulting query should return something like

 IDA | IDB 1 | a 2 | b 3 | null <-- orphan event 

Although the sample request I put here shows exactly the problem I have. When time overlaps, it is difficult to systematically select the correct line.

dense_rank() seems to be the answer to choosing the right lines, but what sorting / sorting will put them correctly?

Worth mentioning, I am doing this on Oracle 11gR2.

+6
source share
1 answer

It seems that this should be possible with a single SQL statement using Oracle analytic functions, perhaps with some combination of row_number (), lag () and max (). But I just could not circle my head. I continued to want to embed one analytic function in another, and I don't think you can do it. You can go step by step using common table expressions, but I could not figure out how to make it work.

But the procedural solution is pretty straightforward, using PL * SQL along with an extra table to store your result. I use row_number () to assign a chronological rank to each row in each of your source tables. You want to get a certain result, so it is important to have a connection breaker in case you have duplicate date-time, therefore, my order in dt, id. Here is the demo version of SQL-Fiddle .

Or look at the code below:

 create table result ( dif number, ida varchar(1), idb varchar(1), dta date, dtb date ); declare prevA integer := 0; prevB integer := 0; begin for rec in ( with ordered_ta as ( select dt dta, id ida, row_number() over (order by dt, id) rowNumA from ta ), ordered_tb as ( select dt dtb, id idb, row_number() over (order by dt, id) rowNumB from tb ) select ta.*, tb.*, abs(dta - dtb) * 86400 dif from ordered_ta ta join ordered_tb tb on dtb between (dta - 5/86400) and (dta + 5/86400) order by rowNumA, rowNumB ) loop if rec.rowNumA > prevA and rec.rowNumB > prevB then prevA := rec.rowNumA; prevB := rec.rowNumB; insert into result values ( rec.dif, rec.ida, rec.idb, rec.dta, rec.dtb ); end if; end loop; end; / select * from result union all select null dif, id ida, null idb, dt dta, null dtb from ta where id not in (select ida from result) union all select null dif, null ida, id idb, null dta, dt dtb from tb where id not in (select idb from result) ; 
+2
source

Source: https://habr.com/ru/post/947385/


All Articles