SQL Server: column-enabled indexes, what's the difference?

I never understood the difference between these two indexes, can anyone explain what the difference is (in performance, how will the structure of the index look like in db, in storage, etc.)?

I understand this question is wide, please tell me about it. I really don't know how to cover it. Perhaps if you guys start explaining your know-how, will I get pointers in the right direction, allowing me to narrow the question down?

Index Included

CREATE NONCLUSTERED INDEX IX_Address_PostalCode ON Person.Address (PostalCode) INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID); 

"Normal" index

 CREATE NONCLUSTERED INDEX IX_Address_PostalCode ON Person.Address (PostalCode, AddressLine1, AddressLine2, City, StateProvinceID); 
+6
source share
3 answers

Index memory uses the B-Tree structure and consists of “index pages” (root and all intermediate pages) and “index data pages” (leaf pages only).

Note. Do not confuse “index data pages” with “data pages” (leaf pages of clustered indexes) that store most columns of actual data.

  • Only index columns are stored on index pages.
  • By placing several columns in the INCLUDE section, less data per index key will be stored on each page.
  • Storing index keys requires fewer pages. (Simplifying the caching of these frequently used pages in memory longer.)
  • And perhaps fewer levels in the tree. (In this case, the performance benefits can be much greater, because each tree-level bypass is another access to the disk.)

When an index is used, the index key is used to navigate index pages to the desired page of index data.

  • If the index has INCLUDE columns, this data will be immediately available if the query is needed.
  • If the query requires that the columns are not accessible in either the index keys or INCLUDE columns, an additional bookmark search is required for the correct row in the cluster index (or heap if the cluster index is not defined).

Some things to note that hopefully affect some of your confusions:

  • If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what is in the INCLUDE columns).
  • Each index you create has overhead for the INSERT and UPDATE statements; especially for "large" indexes. (Bigger refers to INCLUDE columns.)
  • So, although theoretically you can create many large indexes with columns included to match all permutations of access paths: this will be very counterproductive.

It is worth noting that before INCLUDE columns were added as a function:

  • This was the usual “trick” index setting to expand the index keys to include columns that are not needed in the index / filter. (Known as coverage index.)
  • These columns are usually required in output columns or as reference columns for joining other tables.
  • This would have avoided the notorious “bookmark search”, but had the disadvantage of making the index “wider” than strictly necessary.
  • In fact, very often the early columns in the index already identified a unique row , which means that the added columns will be completely redundant, if not for the benefit of "avoiding bookmarking." Columns
  • INCLUDE mainly provide more efficient performance.

NB Something is very important to indicate. Usually you get a null advantage from INCLUDE columns in your indexes if you are lazy used to always write your queries as SELECT * ... By returning all columns , you basically guarantee that bookmark searches are necessary anyway.

+8
source

In the first index Index page index, only the PostalCode is the key column, and AddressLine1, AddressLine2, City, StateProvinceID are part of the node sheet to avoid key/RID search

I would prefer the first index when my table is always filtered by PostalCode , and any of these columns AddressLine1, AddressLine2, City, StateProvinceID will be part of select , not filtering

 select AddressLine1, AddressLine2, City, StateProvinceID from Person.Address Where PostalCode= 

The second index in the Index page will have five key columns: PostalCode, AddressLine1, AddressLine2, City, StateProvinceID

I would prefer a second index when I have the ability to filter data like

 Where PostalCode = And AddressLine1 = 

or

 Where PostalCode = And AddressLine2 = 

or

 Where PostalCode = And AddressLine1 = and AddressLine2 = 

etc.

In any case, the first column in the index must be part of the filtering to use the index

+3
source

In the first example, only the index column: PostalCode is stored in the index tree with all the other columns stored at the index sheet level. This makes the index smaller and useful if you do not use the Join group to other columns, but only against PostalCode.

In the second index, all data for all columns is stored in the index tree, which makes the index much larger, but it is useful if you use any of the columns in WHERE / JOIN / GROUP BY / ORDER By statement.

Include columns allows you to get data faster when they are listed in the selection list.

For example, if you work:

 SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID FROM Person.Address Where PostalCode= 'A1234' 

This will help to create an index in PostalCode and include all other columns.

On the other hand, if you work:

 SELECT PostalCode, AddressLine1, AddressLine2, City, StateProvinceID FROM Person.Address Where PostalCode= 'A1234' or City = 'London' or StateProvinceID = 1 or AddressLine1 = 'street A' or AddressLine2 = 'StreetB' 

This will benefit more from having all the columns in the index.

Take a look at the links below, this may help you with your request.

Column enabled index: https://msdn.microsoft.com/en-us/library/ms190806(v=sql.105).aspx

Organization of tables and indexes: https://msdn.microsoft.com/en-us/library/ms189051(v=sql.105).aspx

+1
source

Source: https://habr.com/ru/post/1014300/


All Articles