SQL Server Nonclustered Index

This question is about designing non-clustered indexes in SQL Server 2005.

I have a large table with several million rows. Lines are always read or inserted. Most operations are read. I looked at various SELECT queries that access a table in order to improve read access speed. Disk space is not really a problem. (Each row has a unique identifier, and I use this as one field in a clustered index.)

My question is, if a non-clustered index indexes more columns than the query uses, does that lead to a slower query execution than an index that exactly matches the query?

As the number of different queries increases, the number of column permutations used in their WHERE clauses. I'm not sure of the tradeoffs between multiple indexes with few columns (one for each query) compared to fewer indexes for more columns.

For example, let's say I have two SELECT queries. The first uses columns A, B, C, and D in its WHERE , and the second uses A, B, E, and F. It would be best to define two indexes: one on A / B / C / D and the other on A / B / E / F; or one index on A / B / C / D / E / F?

+6
source share
4 answers

First of all, the order of the columns in the indices matters. Thus, building / tuning your queries accordingly will allow you to effectively use the indexes that you created.

Whether two indexes are separate or one index depends on the dependencies of the columns in the conflict and the type of queries that are executed. In your example, if columns E and F belong to columns C and D or depend on them, it makes sense to have one index covering all columns.

+3
source

My question is, if a non-clustered index indexes more columns than the query uses, does that lead to a slower query execution than an index that exactly matches the query?

No, having more columns does not slow down the query time for queries that use the first 1, 2, n columns in the index. At the same time, if you are limited in memory, loading the index into memory can force other things out of memory and slow down the query, but if you have a lot of memory, this should not be a problem.

As the number of different queries increases, the number of column permutations used in their WHERE clauses. I'm not sure of the tradeoffs between multiple indexes with few columns (one for each query) compared to fewer indexes for more columns.

You must first add the most frequently requested unique fields to the indexes. Fewer indexes with many columns may not give you what you want.

for example, if you have an index with the following columns:

  • Columna
  • Columnb
  • Columnc
  • Columnd
  • ColumnE
  • Columnf

in that order, queries related to ColumnA, ColumnB, ColumnC, ColumnD ... will use the index, but if you just query ColumnE or ColumnF, it will not use the index.

Take a different approach if you have six indexes in one table, each with one column

  • Index1 - ColumnA
  • Index2 - ColumnB
  • Index3 - ColumnC
  • Index4 - ColumnD
  • Index5 - ColumnE
  • Index6 - ColumnF

in this case, only one of these 6 indices will be used for any query.

Also, if the index contains a value that is not very selective, it may not help you. For example, if you have a GENDER column that can contain the following values ​​(Male, Female, and Unknown), then it probably will not help you include this column in the index. When the query is executed, SQL Server can determine that it is not selective enough, and just assume that a full table scan will be faster.

There are many ways to find out which indexes are used by your query, but one approach I use is to look at indexes that are never used. Run the following query in your database and find out if the indexes you are using are used.

 SELECT iv.table_name, i.name AS index_name, iv.seeks + iv.scans + iv.lookups AS total_accesses, iv.seeks, iv.scans, iv.lookups, t.indextype, t.indexsizemb FROM (SELECT i.object_id, Object_name(i.object_id) AS table_name, i.index_id, SUM(i.user_seeks) AS seeks, SUM(i.user_scans) AS scans, SUM(i.user_lookups) AS lookups FROM sys.tables t INNER JOIN sys.dm_db_index_usage_stats i ON t.object_id = i.object_id GROUP BY i.object_id, i.index_id) AS iv INNER JOIN sys.indexes i ON iv.object_id = i.object_id AND iv.index_id = i.index_id INNER JOIN (SELECT sys_schemas.name AS schemaname, sys_objects.name AS tablename, sys_indexes.name AS indexname , sys_indexes.type_desc AS indextype , CAST(partition_stats.used_page_count * 8 / 1024.00 AS DECIMAL(10, 3)) AS indexsizemb FROM sys.dm_db_partition_stats partition_stats INNER JOIN sys.indexes sys_indexes ON partition_stats.[object_id] = sys_indexes.[object_id] AND partition_stats.index_id = sys_indexes.index_id AND sys_indexes.type_desc <> 'HEAP' INNER JOIN sys.objects sys_objects ON sys_objects.[object_id] = partition_stats.[object_id] INNER JOIN sys.schemas sys_schemas ON sys_objects.[schema_id] = sys_schemas.[schema_id] AND sys_schemas.name <> 'SYS') AS t ON t.indexname = i.name AND t.tablename = iv.table_name --WHERE t.IndexSizeMB > 200 WHERE iv.seeks + iv.scans + iv.lookups = 0 ORDER BY total_accesses ASC; 

I generally track indexes that were never used or were not used a few months after rebooting SQL Server, and determine whether to delete them or not. Too many indexes can sometimes slow down SQL Server using the best path to run the query, and deleting unused indexes can speed up the process.

Hope this helps to understand your indexes.

+1
source

The existing answers are already very good. Here's a new thought: finding the optimal set of indices for a given load and memory availability is a complex problem that requires an exhaustive search for a large search space.

Database Engine Tuning Advisor (DTA) does just that! I recommend that you record a representative workload (including records!), And let the DTA provide you with suggestions. It will also take up disk space.

+1
source

Disk space is not a problem.

Please do not think so. It doesn't matter if you have 500 GB of free space. The larger the table or index, the more time it takes to read from disk. And the more memory takes up (i.e., the buffer pool), the more logical readings will be required to satisfy the request. For more information on this topic, see here: http://www.sqlservercentral.com/articles/data-modeling/71725/

(Each row has a unique identifier, and I use this as one field in a clustered index.)

Most of your queries use this identifier in a WHERE clause? If not, then this might not be the best choice for a clustered index.

My question is: if a non-clustered index indexes more columns than the one used by the query, does that lead to a slower query execution than an index that exactly matches the query?

It depends on several factors. How many more fields are you talking about? One TINYINT field that is 1 byte? Or several fields of 300 bytes? If you do not use filtered indexes, you need to multiply the size of your index plus the size of your cluster index (for indexes other than UNIQUE) by the number of rows. As I mentioned above, more employees take less time, but realistically the additional 5 MB per 100 MB will probably not have a noticeable difference.

Keep in mind that index design is an art and a science. You need to consider which queries will be executed most often and which ORDER BY are used, as well as WHERE clauses. You need to keep in mind that the index will not be used if its main column is not in the query, even if the rest of the index fields are in the query.

Generally speaking, you do NOT want to index each field separately, because:

  • too many indexes slow down DML operations, which is a problem even if most of the operations are SELECT in this table.
  • too many indexes increase the likelihood of blocking
  • a query requesting 4 fields will not use 4 separate indexes. most of the time, the optimizer chooses the one that, in his opinion, will work best, and sometimes he can choose to combine the two of them together, especially if you have an OR condition

For example, let's say I have two SELECT queries. The first uses columns A, B, C, and D in its WHERE clause, and the second uses A, B, E, and F.

You can do your best by indexing only A and B and seeing how it works. If this combination is unique, then consider it as an opportunity for a composite clustered index. If they are not unique, but are still used by most queries, consider creating a clustered index: A, B, IDfield. The inclusion of IDfield last gives a combination of uniqueness. This is important because if your clustered index is not a Primary Key, then you really need to declare the clustered index as UNIQUE so that it does not have a hidden identifier field. The primary key is, by definition, unique.

Also review the INCLUDE parameter for indexes.

And yes, the order of the columns matters, as it determines how the index is organized. Think about the difference between having an ActionDate, CustomerID and CustomerID, an ActionDate. If ActionDate is the first, then it's easier to find all CustomerIDs within a specific date range. But if you care only about one client and want to get several different dates of your information, you will have to skip this entire index, as their data will be distributed through. In this case, you’ll be better off with CustomerID, as you can quickly narrow down the place where their data starts, and then just grab the data you need based on the dates.

But no, the order of your WHERE clause is NOT related to whether the index will be used. SQL Server uses a cost-based optimizer that scans all conditions and uses index statistics (leading column) to determine which plan should be the most suitable.

Finally, be sure to check out the various strategies. Do not just try one thing and move on. You were very general in your description - without even giving data types for fields or how fields are used, so any recommendation here that is very specific is in doubt. Use SET STATISTICS IO ON and find the logical reads. The lower the number, the better!

+1
source

Source: https://habr.com/ru/post/893005/


All Articles