Indexing Zeros for Quick Search in DB2

I understand that zeros are not indexed in DB2, therefore, assuming we have a huge table (Sales) with a date column (sold_on), which is usually a date, but sometimes (10% of the time) is null.

Also, suppose this is an outdated application that we cannot change, so these zeros stay there and mean something (say, sales that were returned).

We can quickly execute the following query by specifying an index in the columns sold_on and total

Select * from Sales where Sales.sold_on between date1 and date2 and Sales.total = 9.99 

But the index will not make this query faster:

 Select * from Sales where Sales.sold_on is null and Sales.total = 9.99 

Because indexing is done by value.

Is it possible to index zeros? Maybe changing the type of index? Index column indexing?

+4
source share
4 answers

I'm not a DB2 expert, but if 10% of your values ​​are zero, I don't think the index in this column will ever help your query. 10% is too much to use the index for - it will just perform a table scan. If you were talking about 2-3%, I think he would really use your index.

Think about how many records are on the page / block - say 20. The reason for using the index is to avoid fetching pages that you don't need. The probability that this page will contain 0 entries that are zero is (90%) ^ 20 or 12%. These are not very good chances - you will need 88% of your pages so that you can get them anyway, using the index is not very useful.

If, however, the select clause included only a few columns (and not *) - let's just say salesid, you can probably use it for the index (sold_on, salesid), because you won’t need to read the data page. All the data will be in the index .

+4
source

Where did you get the impression that DB2 is not indexing NULL? I can not find anything in the documentation or articles supporting the claim. And I just ran the query in a large table using the IS NULL constraint with an indexed column containing a small part of NULL; in this case, DB2 certainly used the index (tested by EXPLAIN and observing that the database was instantly responsive, rather than wasting time scanning the table).

So: I affirm that DB2 has no problems with NULL in non-primary key indexes.

But, as others wrote, your data can be compiled in such a way that DB2 thinks that using the index will not be faster. Or database statistics are not relevant for the respective tables.

+5
source

The rule of thumb is that an index is useful for values ​​up to 15% of records .... therefore an index can be useful here.

If DB2 will not index NULL values, then I would suggest adding the IsSold boolean field and setting it to true when the sold_on date is set (this can be done in a trigger).

This is not a pleasant solution, but it may be what you need.

+1
source

Troels is correct; even rows with a SOLD_ON value of NULL will use the index in this column. If you search by range in SOLD_ON, you can win even more by creating a clustered index that starts with SOLD_ON. In this particular example, maintaining the clustering order based on SOLD_ON may not require a lot of additional overhead, since newer added rows most likely have a newer date SOLD_ON.

0
source

Source: https://habr.com/ru/post/1277027/


All Articles