Oracle uses or ignores indexed column depending on to_date format (literal)

I use an index column used as a filter, putting it between two literal values. (The column is in the second position of the index and actually makes execution slower, I will deal with this later).

What confuses me is that Oracle (11.2.0.3.0) uses or ignores the specified index, depending on the format of the value strings and the format provided by to_date:

This ignores the index:

SQL> SELECT * 2 FROM gprs_history_import gh 3 WHERE start_call_date_time BETWEEN 4 to_date('20140610 000000','yyyymmdd hh24miss') AND 5 to_date('20140610 235959','yyyymmdd hh24miss') 6 / Execution Plan ---------------------------------------------------------- Plan hash value: 990804809 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 350 | 219K| 242K (1)| 00:56:42 | | | | 1 | PARTITION RANGE SINGLE| | 350 | 219K| 242K (1)| 00:56:42 | 74 | 74 | | 2 | PARTITION LIST ALL | | 350 | 219K| 242K (1)| 00:56:42 | 1 | 3 | |* 3 | TABLE ACCESS FULL | GPRS_HISTORY_IMPORT | 350 | 219K| 242K (1)| 00:56:42 | 220 | 222 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 

The index is used here (note the space after the date part on line 4):

 SQL> SELECT * 2 FROM gprs_history_import gh 3 WHERE start_call_date_time BETWEEN 4 to_date('20140610 ','yyyymmdd ') AND 5 to_date('20140610 235959','yyyymmdd hh24miss') 6 / Execution Plan ---------------------------------------------------------- Plan hash value: 464458373 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 350 | 219K| 2795K (1)| 10:52:15 | | | |* 1 | FILTER | | | | | | | | | 2 | PARTITION RANGE ITERATOR | | 350 | 219K| 2795K (1)| 10:52:15 | KEY | 74 | | 3 | PARTITION LIST ALL | | 350 | 219K| 2795K (1)| 10:52:15 | 1 | 3 | | 4 | TABLE ACCESS BY LOCAL INDEX ROWID| GPRS_HISTORY_IMPORT | 350 | 219K| 2795K (1)| 10:52:15 | KEY | 222 | |* 5 | INDEX SKIP SCAN | GPRS_HISTORY_IMPORT_IDX1 | 1 | | 2795K (1)| 10:52:15 | KEY | 222 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(TO_DATE('20140610 ','yyyymmdd ')<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 5 - access("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) filter("START_CALL_DATE_TIME">=TO_DATE('20140610 ','yyyymmdd ') AND "START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 

(The filter in (1) seems a little silly, as if Oracle did not understand the expression)

Again, this is not (I removed the trailing space):

 SQL> SELECT * 2 FROM gprs_history_import gh 3 WHERE start_call_date_time BETWEEN 4 to_date('20140610','yyyymmdd') AND 5 to_date('20140610 235959','yyyymmdd hh24miss') 6 / Execution Plan ---------------------------------------------------------- Plan hash value: 990804809 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 350 | 219K| 242K (1)| 00:56:42 | | | | 1 | PARTITION RANGE SINGLE| | 350 | 219K| 242K (1)| 00:56:42 | 74 | 74 | | 2 | PARTITION LIST ALL | | 350 | 219K| 242K (1)| 00:56:42 | 1 | 3 | |* 3 | TABLE ACCESS FULL | GPRS_HISTORY_IMPORT | 350 | 219K| 242K (1)| 00:56:42 | 220 | 222 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("START_CALL_DATE_TIME"<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 

Putting quotes around space eliminates the use of an index.

What gives?

+6
source share
2 answers

Good - I'll try, this is basically a deduction from availabe. Information:

Why does Oracle choose a different execution plan?

It seems that in your second query with an unusual Date-Format, the optimizer has no idea what the value of the resulting date is. You see the predicate filter:

1 - filter(TO_DATE('20140610 ','yyyymmdd ')<=TO_DATE(' 2014-06-10 23:59:59', 'syyyy-mm-dd hh24:mi:ss'))

This means that the optimizer is not even sure that the first date is less than the second! This means that the optimizer has no idea about the number of rows returned and simply uses the general plan without taking into account specific statistics. It would be the same if you had a custom function xyt () that returns the date for the range. The optimizer is not able to find out what value the date value will have. This means that you get a general goal plan, which should be pretty decent for any given date range.

In the first and third cases, the optimizer seems to immediately understand the date and can guess the number of rows in the date range using statistics. Therefore, when the second query was in the Optimizer, for example BETWEEN X AND 3 , this query is similar to BETWEEN 1 AND 3 Therefore, it optimizes the query plan for the predicted number of rows returned!

The strange thing is, it seems that the query optimizer has such problems with a strange date format, it may be filed as an error / request for improvement ...

But an important point:

  • A full table scan does not have to be a BAD plan ... Just like using an index is not always faster!
  • The cost of a query plan is in no way directly related to the actual lead time or performance - it is an internal dimension for comparing different plans for ONE REQUEST (Thus, you cannot compare the cost of different requests, for example, your requests 1, 2 and 3)

In principle, if you return a large number of rows from a table, a full table scan without access to the index will be much faster in many cases, especially when working on certain sections! - Scanning in the table will have access only to pertition for the corresponding date range - so only for the date in question and returns all rows from this section. This is much faster than querying the index for each individual row, and then retrieving the row by the access index ... Try to profile queries - a full table scan in the section should be 3 times faster with a much lower IO value

+2
source

Errors in the optimizer or the parser lead to the fact that some date formats reduce the static partitioning of dynamic partitioning. Partition trimming changes lead to different capacities and costs, which then lead to significant changes in many other parts of the plan.

This answer only partially explains the problem and contains some assumptions. I hope he at least shed some light on what the problem is and no. This is at least a good starting point if you really need a full explanation and you want to send an Oracle service request.

Terminology and some background readings

Trimming a static section is when the optimizer determines at compile time which section will be used. Statistics refer to sections, which leads to higher power ratings, which leads to better plans. For example, think of a table broken down by status, where the partition for CANCELED is tiny and the partition for ACTIVE is large. Knowing which section is used can completely change the connection order and access methods to the optimal plan. Pstart and Pstop will be numeric values ​​when using static section trimming.

Partition dynamic cropping is when the optimizer cannot determine the partition before runtime. Data is extracted only from the necessary sections, but the execution plan is built without much knowledge of which section is used. Some statistical section estimates will be a simple average of all available sections. In the example of a table broken down by status, the average value of a tiny section and a large section also does not accurately reflect. Either Pstart or Pstop will include the word KEY when dynamic section cropping is used.

The Oracle Database Database VLDB and Partitioning Guide contains a section on Data Type Conversions that are worth reading. For example, one relevant quote from the manual:

Only the correctly applied TO_DATE function ensures that the database is able to uniquely determine the date value and use it for static trimming, which is especially useful for accessing a single section.

Schema and data example

This simple test case demonstrates the problem. It also eliminates common performance issues, such as a lack of statistics.

First, create a sample table with two sections, one large and one small.

 create table gprs_history_import(id number, start_call_date_time date) partition by range (start_call_date_time) ( partition p_large values less than (date '2014-06-01'), partition p_small values less than (date '2014-07-01') ); insert into gprs_history_import select level, date '2014-05-01' from dual connect by level <= 1000; insert into gprs_history_import select level, date '2014-06-01' from dual connect by level <= 10; begin dbms_stats.gather_table_stats(user, 'GPRS_HISTORY_IMPORT'); end; / select count(*) from gprs_history_import partition (p_large); -- 1000 select count(*) from gprs_history_import partition (p_small); -- 10 

Static dynamics cause poor power ratings

The static power rating is ideal 1000. The extra space in the second date format changes Pstop from 1 to KEY . The plan changes from static to dynamic cropping of partitions. Dynamic score is inaccurate 505, average 1000 and 10

For simplicity, this example shows only a poor rating of power. There is no need to show a slow query, as evaluating bad strings inevitably leads to poor execution plans for many reasons.

 explain plan for select /* static partition pruning */ * from gprs_history_import where start_call_date_time < to_date('20140601 000000','yyyymmdd hh24miss'); select * from table(dbms_xplan.display); Plan hash value: 452971246 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000 | 12000 | 16 (0)| 00:00:01 | | | | 1 | PARTITION RANGE SINGLE| | 1000 | 12000 | 16 (0)| 00:00:01 | 1 | 1 | | 2 | TABLE ACCESS FULL | GPRS_HISTORY_IMPORT | 1000 | 12000 | 16 (0)| 00:00:01 | 1 | 1 | -------------------------------------------------------------------------------------------------------------- explain plan for select /* dybnamic partition pruning */ * from gprs_history_import where start_call_date_time < to_date('20140601 ','yyyymmdd '); select * from table(dbms_xplan.display); Plan hash value: 2464174375 ---------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 505 | 6060 | 29 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ITERATOR| | 505 | 6060 | 29 (0)| 00:00:01 | 1 | KEY | |* 2 | TABLE ACCESS FULL | GPRS_HISTORY_IMPORT | 505 | 6060 | 29 (0)| 00:00:01 | 1 | KEY | ---------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("START_CALL_DATE_TIME"<TO_DATE('20140601 ','yyyymmdd ')) 

Date format parsing issues

Now about some assumptions about why the request moves from static to dynamic partitioning.

It is not always obvious when the optimizer can use static and dynamic splitting. In general, literals allow static cropping, while variables require dynamic cropping.

 --#1: Obviously static: It uses an unambiguous ANSI date literal. select * from gprs_history_import where start_call_date_time = date '2000-11-01'; --#2: Obviously dyanmic: It uses a bind variable. select * from gprs_history_import where start_call_date_time = :date; --#3: Probably dynamic: The optimizer cannot always infer the literal value. select * from gprs_history_import where start_call_date_time = (select date '2000-11-01' from dual); --#4: Probably static: FEB is not always valid, but Oracle can figure that out. select * from gprs_history_import where start_call_date_time = to_date('01-FEB-2000', 'DD-MON-YYYY'); 

When you look at all the issues of productivity and internationalization in Case No. 4, it becomes clear how difficult it is to parse dates. The value of to_date('01-FEB-2000', 'DD-MON-YYYY') depends on several NLS parameters, such as NLS_DATE_LANGUAGE . The request is valid for English, but not for German. And if NLS_CALENDAR not set to GREGORIAN , than the numeric date format may be wrong. The to_date string to_date not a binding value, but it is also not literal.

The difference between true date literals and formatted strings is more obvious if hard parses are counted. Request number 1 will not force a hard parsing, even if the language is changed, but request number 4 will be. This can be demonstrated by performing several options for each of them, changing the language, and then running select value from v$sesstat natural join v$statname where name = 'parse count (hard)' and sid = userenv('SID'); .

Oracle should have a variable somewhere to indicate "it is not a binding variable, but may lead to different plans based on NLS settings." This variable does not always truncate the dynamic section, but there should be some errors that sometimes interrupt it.

+2
source

Source: https://habr.com/ru/post/970913/


All Articles