SQL Server table partitioning based on module function?

I have a really large table (over 10 million rows) that is starting to show signs of poor performance for queries. Since this table is likely to double or triple in size relatively soon, I am looking for a table layout to squeeze some query performance.

The table looks something like this:

CREATE TABLE [my_data] ( [id] [int] IDENTITY(1,1) NOT NULL, [topic_id] [int] NULL, [data_value] [decimal](19, 5) NULL ) 

So, a bunch of meanings for any topic. The queries in this table will always be by topic ID, so there is a clustered index (id, topic_id).

In any case, since identifiers are not limited to topics (any number of topics can be added) I would like to try to break this table into the function of the topic identifiers module. So something like:

 topic_id % 4 == 0 => partition 0 topic_id % 4 == 1 => partition 1 topic_id % 4 == 2 => partition 2 topic_id % 4 == 3 => partition 3 

However, I did not see a way to say "create a partition function" or "create a partition scheme" to perform this operation when selecting a partition.

Is it possible? How can we create a split function based on an operation performed on an input value?

+4
source share
4 answers

You just need to create the module column as a computed PERSISTED column.

Peter’s blue style, I did it here earlier (although I'm not sure that I have a rule of section values ​​on the right):

 CREATE PARTITION FUNCTION [PF_PartitonFour] (int) AS RANGE RIGHT FOR VALUES ( 0, 1, 2) GO CREATE PARTITION SCHEME [PS_PartitionFourScheme] AS PARTITION [PF_PartitonFour] TO ([TestPartitionGroup1], [TestPartitionGroup2], [TestPartitionGroup3], [TestPartitionGroup4]) GO CREATE TABLE [my_data] ( [id] [int] IDENTITY(1,1) NOT NULL, [topic_id] [int] NULL, [data_value] [decimal](19, 5) NULL [PartitionElement] AS [topic_id] % 4 PERSISTED, ) ON [PS_PartitionFourScheme] (PartitionElement); GO 
+4
source

Hash sharing is not available in SQL Server 2005/2008. You must use range splitting.

In this case, you should know that separation is primarily a storage parameter, see Separated tables and basic concepts :

Partitioning makes large tables or indexes more manageable , because partitioning allows you to manage and quick access to subsets of data and effectively, while maintaining the integrity of data collection. From using partitioning, such an operation as loading data from OLTP to an OLAP system takes only a few seconds, instead of minutes and hours in earlier versions of SQL Server. Maintenance operations that are performed on subsets of data are also performed more efficiently because these operations are intended only for the data that is needed, instead of the entire table.

As you can see, the introduction of partitioning in MSDN focuses on maintenance, manageability, and data loading. In my experience, partitioning gives, at best, performance gains. Especially in SQL 2005. This usually leads to poor performance. To improve performance, you should use the correct clustered index and properly designed nonclustered indexes.

In SQL 2008, parallel statements for partitions are improved if they are correctly distributed in terms of I / O; see Partition Design for Better Query Performance . Their advantage is negligible, although overshadowed by the benefits of a properly designed set of cluster and non-clustered indexes. The fact is that the cluster index in (id, topic_id), where id is an identifier, is useful only for searching by one element by id. A clustered index (topic_id, id), on the other hand, will benefit any query that searches for specific topics. I don’t know your system requirements and the queries that you run, but 10M row performance problems on such a narrow tabular smell as the indexing and querying problem, and the separation problem.

+3
source

From the documentation, it seems that you need to give function values:

To create 4 sections ...

 CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000); 

Could you just do the calculations above this call and find the correct values ​​for the separation? Substitute values ​​in a call? Or am I skipping why you want to use the module? Based on the possibility that your identifier has spaces, you may need mathematical statistics to find out where to break.

 CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (@low, @Med, @High); 
0
source

10 million rows are not enough to process the SQL server; a regular index design is likely to solve this without the need for separation. As already noted, try clustering on different sets of columns; clustering on topicid, id, it seems to test something, especially if most requests have topid criteria. Such a clustered index has about the same effect as for passivation, at least in that it groups related series of data on disk and allows a range scan to quickly retrieve them.

If this project works, all you need to worry about is fragmentation from the inserts, but it is manageable. After indexing correctly, make sure that you have enough RAM and that you do not have a bottleneck on the disk.

0
source

Source: https://habr.com/ru/post/1299974/


All Articles