INNER JOIN performance with the condition "<" or ">"

Question

INNER JOIN performance with the condition "<" or ">"

I have two tables with a SessionOrder column. This column is an integer data type and has the following index: CREATE INDEX OSIDX_<internal name> ON <Entity> .

I execute the following query:

 SELECT i_0.rn, i_1.rn FROM ( SELECT "RawEvent"."SessionOrder" as rn FROM "RawEvent" i_0 WHERE something = 12 ) INNER JOIN ( SELECT "RawEvent"."SessionOrder" as rn FROM "RawEvent" i_1 WHERE something = 14 ) ON i_0.rn > i_1.rn

The problem with this request is ON i_0.rn > i_1.rn , which gets too slow and expires. I replaced it with ON i_0.rn = i_1.rn , and it was very fast, but obviously did not give the expected results.

Does anyone know how to improve the performance of this request by avoiding a timeout? Another goal of this question is to understand why it has poor performance with ON i_0.rn > i_1.rn .

PS: it is impossible to increase the wait time

+6

sql inner-join oracle indexing

Helder gonçalves Dec 21 '16 at 17:56

source share

3 answers

Marmite bomber · Answer 1 · 2016-12-21T19:27:01+0000

Please check first if you are really using an Oracle database. The syntax of your SQL offers either a different DBMS or some kind of predecessor.

To get an idea of what you can expect from such queries, you can use a dummy example as follows.

Creating Sample Data

 create table myTab as with mySeq as (select rownum SessionOrder from dual connect by level <= 10000) select 12 something, SessionOrder from mySeq union all select 14 something, SessionOrder from mySeq ;

This creates both sources, each containing 10,000 sequences, ranging from 1 to 10,000.

Test request

 create table myRes as select a.SessionOrder rn0, b.SessionOrder rn1 from myTab a join myTab b on a.SessionOrder > b.SessionOrder and a.something = 12 and b.something = 14;

Produces 49.995.000 lines in less than 30 seconds.

If you expect to get such a great result in much less time, you will need advanced optimization. Without knowing your data and requirements, no general advice is possible.

Helder gonçalves · Answer 2 · 2016-12-22T11:46:05+0000

As recommended, I tried to solve the problem with a different strategy, which got great performance.

Despite this simple solution, I don’t understand why the original request became too slow. I think the Oracle engine does not use indexes.

 SELECT i_0."SessionOrder", i_1."SessionOrder" FROM "RawEvent" i_0 INNER JOIN "RawEvent" i_1 ON i_0."SessionOrder" < i_1."SessionOrder" WHERE i_0."something" = 12 AND i_1."something" = 14

Marmite bomber · Answer 3 · 2016-12-24T14:09:54+0000

Your request performs three tasks:

1) data sampling for both subsets (12 and 14)

2) join the data and

3) pass the result to the client

Note that accessing the index (which you suspect is causing problems) is only relevant for step 1. Therefore, in order to get a better impression, the first thing to do is to realize the time elapsed between the three steps. This can be done using SQL * Plus (I use the same generated data as in the previous answer)

Data access

Since my table has no index, executing a counter (*) does a FULL TABLE SCAN. Therefore, in the worst case, it is used twice to obtain data.

 SQL> set timi on SQL> set autotrace on SQL> select count(*) from mytab; COUNT(*) ---------- 20000 Elapsed: 00:00:01.13 Execution Plan ---------------------------------------------------------- Plan hash value: 3284627250 -------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5472 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| MYTAB | 20000 | 5472 (1)| 00:00:01 | --------------------------------------------------------------------

FTS is ready in about one second, so both groups get aprox. two seconds elapsed.

Join

The elapsed time for a connection can be modeled using the CTAS connection request.

 SQL> create table myRes as 2 select a.SessionOrder rn0, b.SessionOrder rn1 3 from myTab a join myTab b on a.SessionOrder > b.SessionOrder and 4 a.something = 12 and b.something = 14; Table created. Elapsed: 00:00:23.65

Join returns almost 50M rows (due to larger than the condition) and takes about 21 seconds (I subtract 2 seconds to access the data).

TRANSFER data to the client

We use the set autotrace traceonly to suppress the query output on the client’s screen, but the data is transmitted, so we can measure the time. (If you visualize the result on the screen, the time will be much higher)

 SQL> SET ARRAYSIZE 5000 SQL> set autotrace traceonly SQL> select a.SessionOrder rn0, b.SessionOrder rn1 2 from myTab a join myTab b on a.SessionOrder > b.SessionOrder and 3 a.something = 12 and b.something = 14; 49995000 rows selected. Elapsed: 00:03:03.89 Execution Plan ---------------------------------------------------------- Plan hash value: 2857240533 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 49M| 667M| 11077 (2)| 00:00:01 | | 1 | MERGE JOIN | | 49M| 667M| 11077 (2)| 00:00:01 | | 2 | SORT JOIN | | 10000 | 70000 | 5473 (1)| 00:00:01 | |* 3 | TABLE ACCESS FULL| MYTAB | 10000 | 70000 | 5472 (1)| 00:00:01 | |* 4 | SORT JOIN | | 10000 | 70000 | 5473 (1)| 00:00:01 | |* 5 | TABLE ACCESS FULL| MYTAB | 10000 | 70000 | 5472 (1)| 00:00:01 | -----------------------------------------------------------------------------

Here is the time to spend about 2:40 minutes

Summary

Thus, in a scenario, out of a total of 3 minutes + only about 2 seconds are spent on data access (or about 1%). Even if you reduce access to data to a tenth, you will not see almost any difference. The problem is connecting and even more so transferring data to the client.

When a pointer can help

And of course it depends ...

In a very special case, when you have a very large table with very small data with something in (12,14) you can profit from the index defined for something AND SessionOrder. This allows the index to be used only for sharing the table.

INNER JOIN performance with the condition "<" or ">"

More articles: