A top-N query does too much work despite optimizing STOPKEY

Question

A top-N query does too much work despite optimizing STOPKEY

It will be long, so here is a short summary to attract you: my top-N query with COUNT STOPKEY and ORDER BY STOPKEY in its plan is still slow, with no good reason.

Now, the details. It starts with a slow function. In real life, this involves string manipulations with regular expressions. For demonstration purposes, here's the intentionally dumb recursive Fibonacci algorithm. I found it to be pretty fast for inputs up to 25, slower around 30 and funny at 35.

 -- I repeat: Please no advice on how to do Fibonacci correctly. -- This is slow on purpose! CREATE OR REPLACE FUNCTION tmp_fib ( n INTEGER ) RETURN INTEGER AS BEGIN IF n = 0 OR n = 1 THEN RETURN 1; END IF; RETURN tmp_fib(n-2) + tmp_fib(n-1); END; /

Now some input: a list of names and numbers.

 CREATE TABLE tmp_table ( name VARCHAR2(20) UNIQUE NOT NULL, num NUMBER(2,0) ); INSERT INTO tmp_table (name,num) SELECT 'Alpha', 10 FROM dual UNION ALL SELECT 'Bravo', 11 FROM dual UNION ALL SELECT 'Charlie', 33 FROM dual;

Here's an example of a slow query: use the slow Fibonacci function to select strings whose number generates a double-digit Fibonacci number.

 SELECT p.name, p.num FROM tmp_table p WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1') ORDER BY p.name;

This is true for 11 and 33, so Bravo and Charlie are on the way out. It takes about 5 seconds to start, almost all of which are slow to calculate tmp_fib(33) . So I want to make a faster version of a slow query, converting it to a top-N request. With N = 1, it looks like this: this:

 SELECT * FROM ( SELECT p.name, p.num FROM tmp_table p WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1') ORDER BY p.name ) WHERE ROWNUM <= 1;

And now it returns the top result, Bravo . But it still takes 5 seconds to run! The only explanation is that it still computes tmp_fib(33) , although the result of this calculation does not matter to the result. He had to decide what Bravo going to output, so there was no need to check the WHERE clause for the rest of the table.

I thought maybe the optimizer just needs to say that tmp_fib expensive. So I tried to say like this:

 ASSOCIATE STATISTICS WITH FUNCTIONS tmp_fib DEFAULT COST (999999999,0,0);

This changes some of the cost numbers in the plan, but it does not make the query run faster.

The output of SELECT * FROM v$version if it is version dependent:

 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for 64-bit Windows: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production

And here is the top-1 request autotracer. It seems like the request took 1 second, but it is not. This went on for about 5 seconds.

 NAME NUM -------------------- ---------- Bravo 11 Execution Plan ---------------------------------------------------------- Plan hash value: 548796432 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 55 | 4 (25)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 1 | 55 | 4 (25)| 00:00:01 | |* 3 | SORT ORDER BY STOPKEY| | 1 | 55 | 4 (25)| 00:00:01 | |* 4 | TABLE ACCESS FULL | TMP_TABLE | 1 | 55 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=1) 3 - filter(ROWNUM<=1) 4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1')) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 27 recursive calls 0 db block gets 25 consistent gets 0 physical reads 0 redo size 593 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed

UPdATE . As I mentioned in the comments, the INDEX hint helps in this matter. It would be enough for you to be accepted as the correct answer, although this did not reflect very well on my real scenario. And in an ironic twist, Oracle seems to have learned from experience, and now chooses the default INDEX plan; I have to tell him NO_INDEX reproduce the initial slow behavior.

In a real scenario, I applied a more complex solution, rewriting the query as a PL / SQL function. Here, what my technique looks like, applies to the fib problem:

 CREATE OR REPLACE PACKAGE tmp_package IS TYPE t_namenum IS TABLE OF tmp_table%ROWTYPE; FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED; END; / CREATE OR REPLACE PACKAGE BODY tmp_package IS FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED IS CURSOR c IS SELECT name, num FROM tmp_table ORDER BY name; rec c%ROWTYPE; outcount INTEGER; BEGIN OPEN c; outcount := 0; WHILE outcount < howmany LOOP FETCH c INTO rec; EXIT WHEN c%NOTFOUND; IF REGEXP_LIKE(tmp_fib(rec.num), '(.)\1') THEN PIPE ROW(rec); outcount := outcount + 1; END IF; END LOOP; END; END; / SELECT * FROM TABLE(tmp_package.get_interesting_names(1));

Thanks to the respondents who read the question and conducted the tests, and helped me understand the implementation plans, and I will get rid of this question, but they offer.

+4

performance oracle query-optimization

Wumpus Q. Wumbley May 21 '13 at 21:30

source share

2 answers

Alex poole · Answer 1 · 2013-05-21T22:23:42+0000

Commentary on commentary because it is too large. Running under 11.2.0.3 (OEL), your request:

 SELECT * FROM ( SELECT p.name, p.num FROM tmp_table p WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1') ORDER BY p.name ) WHERE ROWNUM <= 1; NAME NUM -------------------- ---------- Bravo 11 Elapsed: 00:00:00.094 Plan hash value: 1058933870 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 4 (25)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | |* 2 | VIEW | | 3 | 75 | 4 (25)| 00:00:01 | | 3 | SORT ORDER BY | | 3 | 75 | 4 (25)| 00:00:01 | | 4 | TABLE ACCESS FULL| TMP_TABLE | 3 | 75 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=1) 2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1')) Note ----- - dynamic sampling used for this statement (level=2)

Notice the change in SORT ORDER BY by what you saw, and the corresponding values of rows . Moving an order into a subsample is more like yours:

 SELECT * FROM ( SELECT * FROM ( SELECT p.name, p.num FROM tmp_table p ORDER BY p.name ) WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1') ) WHERE ROWNUM <= 1; NAME NUM -------------------- ---------- Bravo 11 Elapsed: 00:00:07.894 Plan hash value: 548796432 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 171 (99)| 00:00:03 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 1 | 25 | 171 (99)| 00:00:03 | |* 3 | SORT ORDER BY STOPKEY| | 1 | 25 | 171 (99)| 00:00:03 | |* 4 | TABLE ACCESS FULL | TMP_TABLE | 1 | 25 | 170 (99)| 00:00:03 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=1) 3 - filter(ROWNUM<=1) 4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1')) Note ----- - dynamic sampling used for this statement (level=2)

I don’t know how useful or practical it is in your real scenario, but in this case (in my environment, anyway), adding an index to all selected columns - to get a full index scan instead of a full scan table - seems to change the behavior:

 CREATE INDEX tmp_index ON tmp_table(name, num); index TMP_INDEX created. SELECT * FROM ( SELECT p.name, p.num FROM tmp_table p WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1') ORDER BY p.name ) WHERE ROWNUM <= 1; NAME NUM -------------------- ---------- Bravo 11 Elapsed: 00:00:00.093 Plan hash value: 1841475998 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 1 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | |* 2 | VIEW | | 3 | 75 | 1 (0)| 00:00:01 | | 3 | INDEX FULL SCAN| TMP_INDEX | 3 | 75 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=1) 2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1')) Note ----- - dynamic sampling used for this statement (level=2) SELECT * FROM ( SELECT * FROM ( SELECT p.name, p.num FROM tmp_table p ORDER BY p.name ) WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1') ) WHERE ROWNUM <= 1; NAME NUM -------------------- ---------- Bravo 11 Elapsed: 00:00:00.093 Plan hash value: 1841475998 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 1 (0)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 1 | 25 | 1 (0)| 00:00:01 | |* 3 | INDEX FULL SCAN| TMP_INDEX | 1 | 25 | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=1) 3 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1')) Note ----- - dynamic sampling used for this statement (level=2)

By the way, aftr, I ran this several times with any of the rownum options that I eventually started getting ORA-01000: maximum open cursors exceeded . I drop objects at the end of each run, but stay connected. I think this somewhere indicates a different error, although probably not related to what you see, as this happens even when the index is scanned.

Wumpus Q. Wumbley · Answer 2 · 2013-06-03T21:34:09+0000

Interest, apparently, has died out, so I'm just going to generalize possible solutions in an independent answer.

upgrade - the new Oracle seems to be better at optimizing this type of query.
Use the INDEX hint so that the inner query retrieves rows in an already sorted order, which allows STOPKEY to work correctly.
rewrite in PL / SQL with an internal query as a cursor. select from the cursor until you get enough matches, then close it.

A top-N query does too much work despite optimizing STOPKEY

More articles: