SQL Server Query Performance - Clustered Index Search

Sorry for the long post, but below I have included a full script to create and populate a test harness.

My test harness has the following tables

|--------| |-------------| |-----| |--------------| |Column | |ColumnValue | |Row | |RowColumnValue| |--------| |-------------| |-----| |--------------| |ColumnId| |ColumnValueId| |RowId| |RowId | |Name | |ColumnId | |Name | |ColumnValueId | |--------| |Value | |-----| |--------------| |-------------| 

They represent rows and columns in a table. Possible cell values ​​in the column are stored in the ColumnValue column. The selected values ​​for the row are stored in RowColumnValue. (I hope this is clear)

I filled in the data using 10 columns, 10,000 rows, 50 column values ​​per column (500) and 25 selected column values ​​per row (250,000).

I have a dynamic sql that returns all rows separated by columns and contains an XML list of selected column values ​​for each column.

Note. For performance testing purposes, I wrapped the query in SELECT COUNT(*) so that the query does not return a large amount of data over the network.

My test harness fulfills this request (with the score) after about 5-6 seconds. The execution plan shows that 92% of the request is spent searching for the clustered index on [ColumnValue].[PK_ColumnValue] . Client statistics show the client processing time, the total execution time and the server wait time are all 0.

I understand that there are quite a few 250k rows in the RowColumnValue table, and I could expect too much from SQL Server. However, I expect the query should work much faster than that. Or at least the execution plan should represent the other neck of the bottle, not a clustered index search.

Can someone shed some light on the problem or give me some tips on how to make this more efficient?

Dynamic SQL that launches a summary to display the table:

 DECLARE @columnDataList NVARCHAR(MAX) SELECT @columnDataList = CAST ( ( SELECT ', CONVERT(xml, [PVT].[' + [Column].[Name] + ']) [Column.' + [Column].[Name] + ']' FROM [Column] ORDER BY [Column].[Name] FOR XML PATH('') ) AS XML ).value('.', 'NVARCHAR(MAX)') DECLARE @columnPivotList NVARCHAR(MAX) SELECT @columnPivotList = CAST ( ( SELECT ', [' + [Column].[Name] + ']' FROM [Column] ORDER BY [Column].[Name] FOR XML PATH('') ) AS XML ).value('.', 'NVARCHAR(MAX)') EXEC(' SELECT COUNT(*) FROM ( SELECT [PVT].[RowId] ' + @columnDataList + ' FROM ( SELECT [Row].[RowId], [Column].[Name] [ColumnName], [XmlRowColumnValues].[XmlRowColumnValues] [XmlRowColumnValues] FROM [Row] CROSS JOIN [Column] CROSS APPLY ( SELECT [ColumnValue].[Value] [Value] FROM [RowColumnValue] INNER JOIN [ColumnValue] ON [ColumnValue].[ColumnValueId] = [RowColumnValue].[ColumnValueId] WHERE [RowColumnValue].[RowId] = [Row].[RowId] AND [ColumnValue].[ColumnId] = [Column].[ColumnId] FOR XML PATH (''''), ROOT(''Values'') ) [XmlRowColumnValues] ([XmlRowColumnValues]) ) [PivotData] PIVOT ( MAX([PivotData].[XmlRowColumnValues]) FOR [ColumnName] IN ([0]' + @columnPivotList + ') ) PVT ) RowColumnData ') 

Script to create and populate the database:

 SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Row]( [RowId] [int] IDENTITY(1,1) NOT NULL, [Name] [nvarchar](50) NOT NULL, CONSTRAINT [PK_Row] PRIMARY KEY CLUSTERED ( [RowId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Column]( [ColumnId] [int] IDENTITY(1,1) NOT NULL, [Name] [nvarchar](50) NOT NULL, CONSTRAINT [PK_Column] PRIMARY KEY CLUSTERED ( [ColumnId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[RowColumnValue]( [RowId] [int] NOT NULL, [ColumnValueId] [int] NOT NULL, CONSTRAINT [PK_RowColumnValue] PRIMARY KEY CLUSTERED ( [RowId] ASC, [ColumnValueId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[ColumnValue]( [ColumnValueId] [int] IDENTITY(1,1) NOT NULL, [ColumnId] [int] NOT NULL, [Value] [nvarchar](50) NOT NULL, CONSTRAINT [PK_ColumnValue] PRIMARY KEY CLUSTERED ( [ColumnValueId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [FKIX_ColumnValue_ColumnId] ON [dbo].[ColumnValue] ( [ColumnId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO ALTER TABLE [dbo].[ColumnValue] WITH CHECK ADD CONSTRAINT [FK_ColumnValue_Column] FOREIGN KEY([ColumnId]) REFERENCES [dbo].[Column] ([ColumnId]) GO ALTER TABLE [dbo].[ColumnValue] CHECK CONSTRAINT [FK_ColumnValue_Column] GO ALTER TABLE [dbo].[RowColumnValue] WITH CHECK ADD CONSTRAINT [FK_RowColumnValue_ColumnValue] FOREIGN KEY([ColumnValueId]) REFERENCES [dbo].[ColumnValue] ([ColumnValueId]) GO ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_ColumnValue] GO ALTER TABLE [dbo].[RowColumnValue] WITH CHECK ADD CONSTRAINT [FK_RowColumnValue_Row] FOREIGN KEY([RowId]) REFERENCES [dbo].[Row] ([RowId]) GO ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_Row] GO DECLARE @columnLoop INT DECLARE @columnValueLoop INT DECLARE @rowLoop INT DECLARE @columnId INT DECLARE @columnValueId INT DECLARE @rowId INT SET @columnLoop = 0 WHILE @columnLoop < 10 BEGIN INSERT INTO [Column] ([Name]) VALUES(NEWID()) SET @columnId = @@IDENTITY SET @columnValueLoop = 0 WHILE @columnValueLoop < 50 BEGIN INSERT INTO [ColumnValue] ([ColumnId], [Value]) VALUES(@columnId, NEWID()) SET @columnValueLoop = @columnValueLoop + 1 END SET @columnLoop = @columnLoop + 1 END SET @rowLoop = 0 WHILE @rowLoop < 10000 BEGIN INSERT INTO [Row] ([Name]) VALUES(NEWID()) SET @rowId = @@IDENTITY INSERT INTO [RowColumnValue] ([RowId], [ColumnValueId]) SELECT TOP 25 @rowId, [ColumnValueId] FROM [ColumnValue] ORDER BY NEWID() SET @rowLoop = @rowLoop + 1 END 
+4
source share
1 answer

I agree with @marc_s and @KM that this grand design is doomed from the start.

Millions of hours of Microsoft developers went to the creation and fine-tuning of a powerful and powerful database engine, but you are going to invent all this, sorting everything into a small number of shared tables and re-implementing everything that SQL Server has already been developed for you.

SQL Server already has tables containing entity names, column names, etc. The fact that you usually don’t interact directly with these system tables is a good thing: it is called abstraction. And it is unlikely that you are going to perform this abstraction better than SQL Server.

At the end of the day with your approach (a) even the simplest queries will be monstrous; and (b) you will never get closer to optimal performance because you refuse all query optimization that you could get for free.

Without knowing anything about your application or your requirements, it is difficult to give any specific advice. But I would suggest that some good old normalization has come a long way. Any well-implemented nontrivial database has many tables; ten tables plus ten xtab tables shouldn't scare you away.

And don't be afraid of generating SQL code as a way to implement common interfaces across disparate tables. A little can go a long way.

+2
source

Source: https://habr.com/ru/post/1303233/


All Articles