Duplicate column database index

If there is a table containing information about employees, including the Gender column, the value of which can be either M / F. Now it would be advisable to create an index in this column, will the search speed up? Logically, if we run a select statement with a where clause containing Paul as a column, it should reduce the search time by half. But I heard that this type of index will not help and will be virtually ignored by the Database Optimizer during query execution. But why don't I understand? Can someone explain?

+6
source share
2 answers

In most cases, you can use only one index to optimize the database query. If the query must match multiple indexed columns, the query planner will have to decide which of these indices to use. Each index has power, which is approximately equal to the number of different values ​​in the table. An index with a higher power will be more efficient, since selecting rows matching the index will result in multiple rows being checked to match other conditions.

The index in the gender column only cuts the table in half. Any other index will be more effective.

As an analogue, think of phone books. If you had one phone book for the whole country, it would be huge and difficult to search for the specific person you want. So phone books are usually intended for an entire city or several cities in an area to make them reasonable. But if instead of regional phone books instead of "regional phone books" instead of "phone book", then it will be almost unsuitable for use as a phone book for the whole country. The criteria for creating new phone books is that they should be much smaller than a book for the whole country. A reduction factor of 2 is not very useful when you start with a huge size.

+7
source

Presumably, gender takes two meanings. In general, an index on gender would not help. In fact, it can be harmful.

If you select a gender without an index, the query optimizer performs a full scan of the pages of the database pages to satisfy the query. On a typical page, half of the entries will correspond to the query, so you will begin to receive results on the first hit.

At this point in the execution of the query, the index is usually used to reduce the number of pages read. However, if there is an entry with "M" and "F" on each page, each page should be read. Even worse, using an index means that you are reading from one random page, and then another, and another, instead of just reading the values ​​sequentially. Jumping around pages takes a little extra time. If the pages do not all fit into memory, you have a situation called interruption, and this can take a very, very long time.

The only exception is the clustered index, where the values ​​on the pages are actually sorted by values. In this case, a query using the index will be approximately 50% faster, because only pages should be read. This can be especially effective in the archive table, where you have active records that are often viewed. This flag can appear on 10%, 1%, or 0.1% of records, and a clustered index can be greatly improved.

In a large table, it would be rare to run a query that returns half the records. It is possible that gender combined with other columns would be a good candidate for inclusion in the index.

+6
source

Source: https://habr.com/ru/post/981818/


All Articles