SQL Server 2008 and Parallelism Partition Table

My company is migrating to SQL Server 2008 R2. We have a table with lots of archived data. Most queries that use this table use the DateTime value in the where statement. For instance:

Request 1

SELECT COUNT(*) FROM TableA WHERE CreatedDate > '1/5/2010' and CreatedDate < '6/20/2010' 

I make the assumption that partitions are created on the basis of CreateDate, and each partition is distributed across several disks, we have 8 processors, and the database has 500 million records that are evenly distributed across dates from 1/1/2008 to 2/24/2011 (38 sections). These data can also be distributed up to a quarter of a year or other length of time, but allow you to maintain assumptions up to several months.

In this case, I would believe that an 8-processor processor would be used, and only 6 partitions would be requested for dates between 1/5/2010 and 6/20/2010.

Now, if I performed the following query, and my assumptions will be the same as above.

Request 2

 SELECT COUNT(*) FROM TableA WHERE State = 'Colorado' 

Questions?
1. Will all sections be requested? Yes
2. Will all 8 processors be used to fulfill the request? Yes
3. Will performance be better than querying a table that is not a batch? Yes
4. Is there anything else that I am missing?
5. How will the Partition Index help?

The answers to the first three questions above are based on my limited knowledge of SQL Server 2008 and Parallelism split tables. But if my answers are incorrect, can you provide feedback on why I am wrong.

Resource:

Bardev

+4
source share
3 answers

Separation can improve performance - I have seen this many times. Cause markup was developed and there was performance, especially for inserts. Here is an example from the real world:

I have several tables in a SAN with one large blinding disk, as far as we can tell. SAN administrators insist that the SAN knows everything, so it will not optimize data distribution. How can a section help? Fact: he did and does.

We split several tables in the same way (FileID% 200) with 200 ALL partitions on a primary basis. What good would it be if the only reason for having a separation scheme is β€œswap”? No, but the goal of separation is performance. You see, each of these sections has its own swap scheme. I can immediately write data to everyone, and there is no way to get into a dead end. Pages cannot be locked because each writing process has a unique identifier, which equates to a section. 200 partitions increased the performance of 2000x (fact), and deadlocks decreased from 7500 per hour to 3-4 per day. This is for the simple reason that page lock escalation always occurs with large amounts of data, and a high OLTP system and page locks cause deadlocks. Partitioning, even in the same volume and group of files, places partitioned data on different pages and blocking escalation has no effect, because processes do not try to access the same pages.

It’s beneficial for data selection, but not so good. But, as a rule, the separation scheme would be developed taking into account the database. I am sure that Remus developed his incremental loading scheme (e.g. daily workloads) rather than transactional processing. Now, if you often selected rows with a lock (reading was committed), then deadlocks may occur if processes tried to access the same page at the same time.

But Remus is right - in your example, I see no use, in fact there may be some overhead when searching for strings in different sections.

+1
source

Separation is never an option to increase productivity. The best you can hope for is to have performance at the table level. Usually you get a regression that increases with the number of sections. For performance, you need indexes, not partitions. Sections are intended for data management operations: ETL, archiving, etc. Some argue that removing partitions is a possible increase in performance, but to eliminate any partitions of a partition, the leading index key can be placed in the same column as the partition column will give much better results.

Will all sections be requested?

This query needs an index on State . Otherwise, it is scanning the table and scanning the entire table. Scanning a table against a partitioned table is always slower than scanning a table without partitions of the same size. The index itself can be aligned according to the same partition scheme, but the leading key must be State .

Will all 8 processors be used to complete the request?

Parallelism has nothing to do with separation, despite the widespread misconception to the contrary. For both sectioned and non-segmented range scanning, a parallel operator can be used, this will be the query optimizer solution.

Would performance be better than querying a table that is not distributed?

No

How will the partition index help?

An index will help. If the index should be aligned, it should be split. An undivided index will be faster than a partitioned one, but the index alignment requirement for on / off operations cannot be circumvented.

If you are looking at partitioning, this should be because you need to perform quick switch operations to delete old data for the retention policy period or something like that. For performance, you need to look at indexes, not sections.

+6
source

the very first question I have is if your table has a clustered index. if not, you will want it.

In addition, you will need a coverage index for your queries. Index Coverage

If you have a lot of historical data, you can study the archiving process to speed up your oltp applications.

+1
source

Source: https://habr.com/ru/post/1341191/


All Articles