Is a table with many columns still an anti-pattern when using a clustered column storage index in SQL Server 2014?

Reading the index of a clustered column store in SQL Server 2014, I wonder if a table with a huge number of columns is still an anti-pattern. Currently, to solve the problem of having a single table with a large number of columns, I use vertical partitioning , but the presence of a clustered column storage index in this case is not required. Is this correct or am I missing something?

Example: Take, for example, the log of performance counters, raw data can have the following structure:

  โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
 โ•‘ Time โ•‘ Perf1 โ•‘ Perf2 โ•‘ ... โ•‘ ... โ•‘ ... โ•‘ Perf1000 โ•‘
 โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
 โ•‘ 2013-11-05 00:01 โ•‘ 1 โ•‘ 5 โ•‘ โ•‘ โ•‘ โ•‘ 9 โ•‘
 โ•‘ 2013-11-05 00:01 โ•‘ 2 โ•‘ 9 โ•‘ โ•‘ โ•‘ โ•‘ 9 โ•‘
 โ•‘ 2013-11-05 00:01 โ•‘ 3 โ•‘ 2 โ•‘ โ•‘ โ•‘ โ•‘ 9 โ•‘
 โ•‘ 2013-11-05 00:01 โ•‘ 4 โ•‘ 3 โ•‘ โ•‘ โ•‘ โ•‘ 9 โ•‘
 โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ• โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Having such a table with 1000 columns is evil because one row will most likely cover more than one page, because it is usually unlikely that you will be interested in all measures, but the request will always depend on the cost of I / O, etc. .. etc .. To solve this vertical partitioning it usually helps, for example, you can separate the performance counters in different tables into categories (CPU, RAM, etc.).

Conversely, a table such as a clustered column storage index should not be such a problem, because the data will be stored in columns, and the IO involved for each request will contain only the requested columns, no more, regardless of the total number of columns in table.

+6
source share
2 answers

This, of course, is less "bad" than horizontal storage, but 1000 pushes the limit too much. Our data warehouse usually has tables with 100 to 200 columns, and they are quite simple with the column store index. Assuming you have a perfect column store index, each query should only look at a specific vertical index and therefore be very efficient. But if the column store indexes are not optimal for the query, SQL Server should do some jumping between the indexes, which is not good.

There is no rule of thumb. You will have to navigate to answer this question in your specific environment.

+1
source

The type of queries in your workload and the data type in the table are factors that determine whether rowstore or columnstore will give you more benefits. If queries look for a small set of rows, rowstore can provide better performance. If queries are a type of data warehouse query, such as scanning a large amount of data, columnstore will provide better performance. In addition, you can create a non-clustered index of column columns in your table. The query optimizer will decide when to use the column index and when to use other indexes.

I recommend reading a TechNet article containing a list of frequently asked questions for the columnstore index here .

-1
source

Source: https://habr.com/ru/post/957336/


All Articles