How to optimize SQL Server column alignment

I have a clustered column index table for our IOT metrics (timers data). It contains more than 1 billion lines and is structured like this:

CREATE TABLE [dbo].[Data](
[DeviceId] [bigint] NOT NULL,
[MetricId] [smallint] NOT NULL,
[TimeStamp] [datetime2](2) NOT NULL,
[Value] [real] NOT NULL
)

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

Between 2008 and the present, there are about 10,000 different DeviceId values, and TimeStamps. A typical query for this table is as follows:

SET STATISTICS TIME, IO ON
SELECT
    [DeviceId]
    ,[MetricId]
    ,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date]
    ,MIN([Value]) as [Min]
    ,MAX([Value]) as [Max]
    ,AVG([Value]) as [Avg]
    ,SUM([Value]) as [Sum]
    ,COUNT([Value]) as [Count]
FROM
    [dbo].[Data]
WHERE
    [DeviceId] = 6077129891325167032
    AND [MetricId] = 1000
    AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30'
GROUP BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])
ORDER BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])

When I execute this query, I get this for performance metrics:

Since at the moment such a request, as indicated above, is too much reading segments, I consider:

Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000.
Table 'Data'. Segment reads 11, segment skipped 764.

Request Plan: Request Plan

This is not very optimized. I believe that 11 segments were read to extract only 212 out of 1 billion source lines (before grouping / aggregation)

, Niko Neugebauer https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql, :

Columnstore alignment

MetricId TimeStamp 100%. , DeviceId ? Clustered (Rowstore), , ?

+4
1

DeviceId rowstore , MAXDOP = 1 ( - , ). , :

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([DeviceId],[TimeStamp],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

- CISL, :

insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
    VALUES ('[dbo].[Data]', 1, 'DeviceId' );

1 , , , . dbo.cstore_doAlignment, . ( , )

, Niko

+5

Source: https://habr.com/ru/post/1683064/


All Articles