How to find date ranges in records with consecutive dates and duplicate data

Question

How to find date ranges in records with consecutive dates and duplicate data

There is probably a simple solution for this, but I do not see it. I have a table with consecutive dates and often duplicates related data for the following several dates:

Date Col1 Col2 5/13/2010 1 A 5/14/2010 1 A 5/15/2010 2 B 5/16/2010 1 A 5/17/2010 1 A 5/18/2010 3 C 5/19/2010 3 C 5/20/2010 3 C

Using MS T-SQL, I want to find the start and end dates for each run of the individual Col1 and Col2 values:

 StartDate EndDate Col1 Col2 5/13/2010 5/14/2010 1 A 5/15/2010 5/15/2010 2 B 5/16/2010 5/17/2010 1 A 5/18/2010 5/20/2010 3 C

Assumptions: there have never been any dates. Col1 and Col2 are not equal to zero. Any ideas - it is advisable not to use cursors? Thanks a lot, -alan

+4

database sql-server tsql sql-server-2000

alan s May 15, '10 at 7:03

source share

2 answers

Chris bednarski · Answer 1 · 2010-05-15T15:46:04+0000

For SQL 2005+, I think below will work

 WITH DATES AS ( SELECT COL1, COL2, DATE, DATEADD(DAY, -1 * ROW_NUMBER() OVER(PARTITION BY COL1, COL2 ORDER BY DATE), DATE) AS GRP FROM YOUR_TABLE ) SELECT COL1, COL2, MIN(DATE) AS STARTDATE, MAX(DATE) AS ENDDATE FROM DATES GROUP BY COL1, COL2, GRP

If you have duplicate records, use DENSE_RANK() instead of ROW_NUMBER()

For SQL 2000, there is a helper query and its associated query.

 SELECT COL1, COL2, MIN(DATE) AS STARTDATE, MAX(DATE) AS ENDDATE FROM (SELECT COL1, COL2, DATE, (SELECT MIN(DATE) FROM YOUR_TABLE B WHERE B.DATE >= A.DATE AND B.COL1 = A.COL1 AND B.COL2 = A.COL2 AND NOT EXISTS (SELECT * FROM YOUR_TABLE C WHERE C.COL1 = B.COL1 AND C.COL2 = B.COL2 AND DATEDIFF(DAY, B.DATE, C.DATE) = 1) ) AS GRP FROM YOUR_TABLE A ) GROUP BY COL1, COL2, GRP

Andomar · Answer 2 · 2010-05-15T09:42:43+0000

Here is one approach using outer apply . Replace @t with the name of your table.

 SELECT head.date, last.date, head.col1, head.col2 FROM @t head OUTER APPLY ( SELECT TOP 1 * FROM @tt WHERE t.date < head.date ORDER BY t.date desc ) prev OUTER APPLY ( SELECT TOP 1 * FROM @tt WHERE t.date > head.date AND (t.col1 <> head.col1 or t.col2 <> head.col2) ORDER BY t.date ) next OUTER APPLY ( SELECT TOP 1 * FROM @tt WHERE (t.date < next.date or next.date is null) AND (t.col1 = head.col1 and t.col2 = head.col2) ORDER BY t.date ) last WHERE (prev.col1 is null or head.col1 <> prev.col1 or head.col2 <> prev.col2)

First, the query selects the line "head": lines that start the new group col1, col2 . This is done by searching for the “previous line” and saying that the where clause should be different.

He then searches for the end of the group col1, col2 . This is a two-step process: first search for the first line of the “next” group, and the line before it is the “last” line.

 Date Col1 Col2 ... 5/15/2010 2 B <-- "prev" row 5/16/2010 1 A <-- "head" row 5/17/2010 1 A <-- "last" row 5/18/2010 3 C <-- "next" row ...

The result of the query corresponds to the output of the example in your question.

How to find date ranges in records with consecutive dates and duplicate data

More articles: