SQL Server index to be grouped?

I have several indexes for some tables, they are all alike, and I want to know if the clustered index is in the correct column. Here are statistics from the two most active indexes:

Nonclustered I3_Identity (bigint) rows: 193,781 pages: 3821 MB: 29.85 user seeks: 463,355 user_scans: 784 user_lookups: 0 updates: 256,516 Clustered Primary Key I3_RowId (varchar(80)) rows: 193,781 pages: 24,289 MB: 189.76 user_seeks: 2,473,413 user_scans: 958 user_lookups: 463,693 updates: 2,669,261 

As you can see, PK is often queried, but all queries for the i3_identity column also do key searches for that PC, so am I really very useful for the I3_Identity index? Should I switch to using I3_Identity as clustered? This can have a huge impact as this table structure is repeated about 10,000 times when I work, so any help would be appreciated.

+7
sql sql-server indexing sql-server-2005
May 01 '09 at 19:32
source share
5 answers

Frederick understands this very well, and this is really what Kimberly Tripp also preaches: the clustering key must be stable (never changing), constantly increasing (IDENTITY INT), small and unique.

In your scenario, I would rather put the clustering key in the BIGINT column, and not in the VARCHAR (80) column.

First of all, with a BIGINT column it is reasonably easy to ensure uniqueness (if you do not use and guarantee uniqueness on your own, SQL Server will add a 4-byte “unique fier” for each of your rows) and it is MUCH less average than VARCHAR (80).

Why is size so important? A clustering key will also be added to EACH and each of your nonclustered indexes - so if you have many rows and many nonclustered indexes, then 40-80 bytes versus 8 bytes can quickly make a HUGE difference.

In addition, another performance tip: To avoid the so-called bookmark searches (from the value in your non-clustered index using the clustering key to the actual data pages), SQL Server 2005 introduced the concept of “including columns” in your non-clustered indexes. They are very useful and often overlooked. If your queries often require index fields, plus only one or two other fields from the database, consider including these questions to achieve what is called "index coverage." Again - see Kimberly Tripp's great article - it is SQL Server Indexing Goddess! :-) and it can explain that the material is much better than I can ...

So, to summarize: put your clustering key on a small, stable unique column - and you will do everything in order!

Mark

+8
May 01 '09 at 20:37
source share

quick 'n dirty:

Put the clustered index on:

  • a column that (almost) never changes

  • column for which values ​​of new records increase / decrease sequentially

  • column in which you search by range

+5
May 01 '09 at 20:16
source share

Here is the best discussion I've found about the topic. Kimberly Tripp is an MS blogger who remains at the top of the discussion. I could interpret this for you, but you obviously understand the basic words and concepts, and the article is very readable. So enjoy it!

Hint: You will find that short answers are almost always too simplistic.

+3
May 01 '09 at 20:13
source share

From what I read in the past, the two most important measures regarding index tables are the number of queries executed against the index and index density. Using DBCC_SHOWSTATISTICS ([table], [index]), you can study the density of the index. The idea is that you want your clustered index in columns to provide the most distinctness for each query.

In short, if you look at the “All Densities” measure from DBCC SHOW_STATISTICS and notice that the number is very low, this is a good indicator for a cluster. It makes sense to cluster by an index, which gives more uniqueness, but only if it is actively querying. Clustering on a rarely used index is likely to do more harm than good.

In the end is the solution. You can talk with your database administrator and analyze your code to find out where you get the most benefit. In this limited example, your indexing seems to be grouped in the right area if you only consider usage (and even if you consider the whole density, given the fact that the primary key provides the most uniqueness you can collect).

Edit: There is a pretty good article on MSDN that explains what SHOW_STATISTICS provides you with. Of course, I am not an uber DBA, but most of the information that I have provided here is in the manual of our DBA :)

Here's the article: http://msdn.microsoft.com/en-us/library/ms174384.aspx

+2
May 01 '09 at 20:07
source share

Typically, when I see key searches for the key key keykey / clustered, this means that I need to include (using the INCLUDE operator) more columns in a non-clustered key. Look at your queries and see which columns are selected / used in these statements. If you include these columns in a non-clustered key, you will no longer need to search for the keys.

+2
May 01 '09 at 20:18
source share



All Articles