Why can't an index-only scan function be used in an index created with COALESCE?

PostgreSQL 9.4 The table is created as follows:

CREATE TABLE foo ( id integer, date date, value numeric(14,3) ); 

I am optimizing the query using the window function ROW_NUMBER() and COALESCE . For the most efficient, I tend to use Index Only Scan in the following query:

 SELECT id, c_val FROM ( SELECT id, COALESCE(value, 0) c_val, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC NULLS LAST) rn FROM foo) sbt WHERE sbt.rn = 1; 

So, if I create an index as follows:

 CREATE INDEX ON foo (id, date DESC NULLS LAST, value); 

the scheduler decides to use Index Only Scan , but if I do it like this:

 CREATE INDEX ON foo (id, date DESC NULLS LAST, COALESCE(value, 0)); 

the scheduler will only perform Index Scan .

Why? I am trying to avoid the cost of evaluating the COALESCE function during query execution. Why does this not work with Index Only Scan ?

+5
source share
1 answer

I think you mistakenly assumed that COALESCE(value, 0) in your SELECT matters in terms of using the index. In truth, only the conversion lookup done after returns the values ​​of the strings.

Regarding the use of the index, then your WINDOW FUNCTION . First, you split by id and sequentially order the values ​​in each section to date DESC NULLS LAST . These two things determine that an index such as CREATE INDEX ON foo (id, date DESC NULLS LAST, ...) is useful no matter what you put in the following positions. Please note: if you change the order of id and date when creating the index, PostgreSQL will not use this index at all.

You should now know that INDEX ONLY SCAN can only be used if the index itself stores the integer values ​​of the untouched row requested by the query. Following PostgreSQL Guide :

If the index stores the original indexed data values ​​(and not any lossy representation), it is useful to support scanning only by index, in which the index returns the actual data ...

In your case, your seccond index stores some representation with row loss, because the value of the last column is converted by the function and query requested for id , value and date . PostgreSQL is not so smart to see that this is just a substitution of NULLs by 0 . For him, this is not an original value. Therefore, we need to access the table to get the original values ​​of the rows (in the end using plain INDEX SCAN ). After that, the values ​​are formatted for output and COALESCE(values, 0) occurs.

Edit:

I think there is enough explanation for you regarding your question about the insides. To talk about the COALECE() cost COALECE() , I agree with a_horse_with_no_name that you probably shouldn't worry about that.

+1
source

Source: https://habr.com/ru/post/1233848/


All Articles