LOW_VALUE and HIGH_VALUE at USER_TAB_COLUMNS

I have a question regarding the LOW_VALUE and HIGH_VALUE columns in the USER_TAB_COLUMNS (or equivalent) view.

I'm just wondering if these values ​​are correct, for example, if you have a column with 500k rows with values ​​of 1, 500k rows with a value of 5 and 1 rows with a value of 1000, LOW_VALUE should be 1 (after converting the raw shape), and the value HIGH_VALUE should be 1000 (after converting the raw shape). However, are there any circumstances where Oracle "skips" this outlier value and instead has 5 for HIGH_VALUE?

Also, what is the purpose of these two meanings?

thanks

+4
source share
1 answer

As with all statistics related to the optimizer, these values ​​are estimates with varying degrees of accuracy when all statistics were collected in a table. Thus, it is expected that they will be close, but not completely accurate and quite possible, that they will be wildly wrong.

When you collect statistics, you indicate the percentage of rows (or blocks) that should be selected. You can specify a sample size of 100%, in which case Oracle will examine each row, but it is relatively rare to request a sample size that would be so large. It is much more efficient to request a much smaller sample size (either explicitly or by allowing Oracle to automatically determine the sample size). If your row pattern does not include one row with a value of 1000, then HIGH_VALUE will not be 1000, HIGH_VALUE will be 5 if this is the largest value that the pattern saw.

Statistics is also a snapshot in time. By default, 11g collects statistics every night on objects that have undergone sufficient changes since the last statistics collection on this object to ensure statistics are updated, although you can disable this task or change the settings. Therefore, if you collect statistics today with a 100 percent sample size to get HIGH_VALUE of 1000, then insert one row with a value of 3000 and never change the table again, it is likely that Oracle will never collect statistics on this table again (if you have not explicitly requested it) and that HIGH_VALUE will remain 1000 forever.

Assuming there is no histogram in the column (this is another whole discussion), Oracle uses LOW_VALUE and HIGH_VALUE to evaluate how selective a particular predicate will be. If the LOW_VALUE value is 1, the HIGH_VALUE value is 1000, the table contains 1,000,000 rows, the column does not have a histogram, and you execute a query like

 SELECT * FROM some_table WHERE column_name BETWEEN 100 and 101 

Oracle guesses that the data is evenly distributed between 1 and 1000 so that this query returns 1000 rows (multiplying the number of rows in the table (1 million) by the fraction of the range that the query extends (1/1000)). This assessment of selectivity, in turn, would lead to the determination of the optimizer of whether it would be more efficient to use the index or scan the table, what join methods to use, what order to evaluate various predicates, etc. If you have an uneven distribution of data, however, most likely you will get a histogram in a column that gives Oracle more detailed information about the distribution of data in a column than LOW_VALUE and HIGH_VALUE .

+6
source

Source: https://habr.com/ru/post/1389164/


All Articles