Efficient way to store date ranges

Question

Efficient way to store date ranges

I need to store simple data - suppose I have some products with codes as a primary key, some properties and validity ranges. So the data might look like this:

Products code value begin_date end_date 10905 13 2005-01-01 2016-12-31 10905 11 2017-01-01 null

These ranges do not overlap, so on every date I have a list of unique products and their properties. Therefore, to facilitate its use, I created a function:

 create function dbo.f_Products ( @date date ) returns table as return ( select from dbo.Products as p where @date >= p.begin_date and @date <= p.end_date )

This is how I will use it:

 select * from <some table with product codes> as t left join dbo.f_Products(@date) as p on p.code = t.product_code

Everything is fine, but how can I let the optimizer know that these lines are unique in order to have a better execution plan?

I did some search queries and found some really good articles for DDL that prevent the overlapping ranges from being kept in the table:

But even if I try these restrictions, I see that the optimizer cannot understand that the resulting set of records will return unique codes.

What I would like to have is a specific approach that gives me basically the same performance as if I saved a list of these products on a specific date and selected it using date = @date .

I know that some RDMBS (e.g. PostgreSQL) have special data types ( Range Types ) for this. But SQL Server has nothing of the kind.

Am I missing something or not a way to do it right in SQL Server?

+6

sql sql-server intervals sql-server-2016 date-range

Roman pekar Nov 10 '16 at 16:45

source share

5 answers

Aducci · Answer 1 · 2018-02-16T22:05:58+0000

You can create an index view containing a string for each code/date in a range.

 ProductDate (indexed view) code value date 10905 13 2005-01-01 10905 13 2005-01-02 10905 13 ... 10905 13 2016-12-31 10905 11 2017-01-01 10905 11 2017-01-02 10905 11 ... 10905 11 Today

Like this:

 create schema digits go create table digits.Ones (digit tinyint not null primary key) insert into digits.Ones (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9) create table digits.Tens (digit tinyint not null primary key) insert into digits.Tens (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9) create table digits.Hundreds (digit tinyint not null primary key) insert into digits.Hundreds (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9) create table digits.Thousands (digit tinyint not null primary key) insert into digits.Thousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9) create table digits.TenThousands (digit tinyint not null primary key) insert into digits.TenThousands (digit) values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9) go create schema info go create table info.Products (code int not null, [value] int not null, begin_date date not null, end_date date null, primary key (code, begin_date)) insert into info.Products (code, [value], begin_date, end_date) values (10905, 13, '2005-01-01', '2016-12-31'), (10905, 11, '2017-01-01', null) create table info.DateRange ([begin] date not null, [end] date not null, [singleton] bit not null default(1) check ([singleton] = 1)) insert into info.DateRange ([begin], [end]) values ((select min(begin_date) from info.Products), getdate()) go create view info.ProductDate with schemabinding as select p.code, p.value, dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) as [date] from info.DateRange as dr cross join digits.Ones as ones cross join digits.Tens as tens cross join digits.Hundreds as huns cross join digits.Thousands as thos cross join digits.TenThousands as tthos join info.Products as p on dateadd(day, ones.digit + tens.digit*10 + huns.digit*100 + thos.digit*1000 + tthos.digit*10000, dr.[begin]) between p.begin_date and isnull(p.end_date, datefromparts(9999, 12, 31)) go create unique clustered index idx_ProductDate on info.ProductDate ([date], code) go select * from info.ProductDate with (noexpand) where date = '2014-01-01' drop view info.ProductDate drop table info.Products drop table info.DateRange drop table digits.Ones drop table digits.Tens drop table digits.Hundreds drop table digits.Thousands drop table digits.TenThousands drop schema digits drop schema info go

Shnugo · Answer 2 · 2016-11-10T17:02:50+0000

A solution without spaces can be as follows:

 DECLARE @tbl TABLE(ID INT IDENTITY,[start_date] DATE); INSERT INTO @tbl VALUES({d'2016-10-01'}),({d'2016-09-01'}),({d'2016-08-01'}),({d'2016-07-01'}),({d'2016-06-01'}); SELECT * FROM @tbl; DECLARE @DateFilter DATE={d'2016-08-13'}; SELECT TOP 1 * FROM @tbl WHERE [start_date]< =@DateFilter ORDER BY [start_date] DESC

Important: Make sure at start_date

there is a (unique) index,

UPDATE: for different products

 DECLARE @tbl TABLE(ID INT IDENTITY,ProductID INT,[start_date] DATE); INSERT INTO @tbl VALUES --product 1 (1,{d'2016-10-01'}),(1,{d'2016-09-01'}),(1,{d'2016-08-01'}),(1,{d'2016-07-01'}),(1,{d'2016-06-01'}) --product 1 ,(2,{d'2016-10-17'}),(2,{d'2016-09-16'}),(2,{d'2016-08-15'}),(2,{d'2016-07-10'}),(2,{d'2016-06-11'}); DECLARE @DateFilter DATE={d'2016-08-13'}; WITH PartitionedCount AS ( SELECT ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY [start_date] DESC) AS Nr ,* FROM @tbl WHERE [start_date]< =@DateFilter ) SELECT * FROM PartitionedCount WHERE Nr=1

Anton · Answer 3 · 2018-02-15T23:45:37+0000

First you need to create a unique clustered index for (begin_date, end_date, code)

Then the SQL engine will be able to do INDEX SEEK.

Alternatively, you can also try creating a view for the dbo.Products table to join this table with the pre-populated dbo.Dates table.

 select p.code, p.val, p.begin_date, p.end_date, d.[date] from dbo.Product as p inner join dbo.dates d on p.begin_date <= d.[date] and d.[date] <= p.end_date

Then in your function, you use this view as "where @date = view.date". The result may be better or slightly worse ... it depends on the actual data.

You can also try indexing this view (depending on how often it is updated).

Alternatively, you can improve performance by populating the dbo.Products table for each date in the range [begin_date] .. [end_date].

Vladimir Baranov · Answer 4 · 2018-02-16T05:12:57+0000

The approach with ROW_NUMBER scans the entire Products table once. This is the best way if the Products table has many product codes and multiple validity ranges for each code.

 WITH CTE_rn AS ( SELECT code ,value ,ROW_NUMBER() OVER (PARTITION BY code ORDER BY begin_date DESC) AS rn FROM Products WHERE begin_date <= @date ) SELECT * FROM <some table with product codes> as t LEFT JOIN CTE_rn ON CTE_rn.code = t.product_code AND CTE_rn.rn = 1 ;

If you have several product codes and many validity ranges for each code in the Products table, then it is better to look for the Products table for each code using OUTER APPLY .

 SELECT * FROM <some table with product codes> as t OUTER APPLY ( SELECT TOP(1) Products.value FROM Products WHERE Products.code = t.product_code AND Products.begin_date <= @date ORDER BY Products.begin_date DESC ) AS A ;

Both options require a unique index on (code, begin_date DESC) include (value) .

Note that queries do not even look at end_date , since they assume that intervals do not have spaces. They will work in SQL Server 2008.

Pittsburgh dba · Answer 5 · 2018-02-16T19:37:36+0000

EDIT: My original answer was used by INNER JOIN, but the questioner wanted to get a LEFT JOIN.

 CREATE TABLE Products ( [Code] INT NOT NULL , [Value] VARCHAR(30) NOT NULL , Begin_Date DATETIME NOT NULL , End_Date DATETIME NULL ) /* Products code value begin_date end_date 10905 13 2005-01-01 2016-12-31 10905 11 2017-01-01 null */ INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 13, '2005-01-01', '2016-12-31') INSERT INTO Products ([Code], [Value], Begin_Date, End_Date) VALUES (10905, 11, '2017-01-01', NULL) CREATE NONCLUSTERED INDEX SK_ProductDate ON Products ([Code], Begin_Date, End_Date) INCLUDE ([Value]) CREATE TABLE SomeTableWithProductCodes ( [CODE] INT NOT NULL ) INSERT INTO SomeTableWithProductCodes ([Code]) VALUES (10905)

Here is a prototype query with a date predicate. Note that there are more optimal ways to do this in a bulletproof way using the less operator on the upper bound, but this is another discussion.

 SELECT P.[Code] , P.[Value] , P.[Begin_Date] , P.[End_Date] FROM SomeTableWithProductCodes ST LEFT JOIN Products AS P ON ST.[Code] = P.[Code] AND '2016-06-30' BETWEEN P.[Begin_Date] AND ISNULL(P.[End_Date], '9999-12-31')

This query will search for the index in the Product table.

Here is the SQL script: SQL Fiddle - Products and Dates

Efficient way to store date ranges

UPDATE: for different products

More articles: