Group by column with multiple groups of duplicate data

Question

Group by column with multiple groups of duplicate data

I need to group some data based on dates in places, including identification when a date range does not take place. I am a little of the fact that I was able to create a list of ALL dates in the range and location.

date1 location1
date2 location1
date3 location1
date4 Unknown
date5 Unknown
date6 Unknown
date7 Location2
date8 Location2
date9 Location2
date10 Location2
date11 location1
date12 location1
date13 location1

using a regular group (showing min (date) and max (date), I get something like:

Location1, date1, date13
LOCATION2, date7, date10
Unknown, date4, date6

But I want this:

Location1, date1, date3
Unknown, date4, date6
LOCATION2, date7, date9
Location1, date11, date13

I also need to filter out short ranges of the Unknown, but the second.

I hope this makes sense, it seems like something that should be very simple.

+4

date tsql range gaps-and-islands grouping

Deadeye Oct 17 '12 at 16:25

source share

1 answer

Anssssss · Answer 1 · 2012-10-17T19:20:59+0000

Take a look at the issue of islands and spaces and Itzik Ben-gan. There is a kit based on the method of obtaining the desired results.

I studied using ROW_NUMBER or RANK, but then I came across LAG and LEAD (introduced in SQL 2012), which are good. I have a solution below. This could definitely be simplified, but bearing in mind that several CTEs make my thinking process (as if mistaken, as it may be) easier to see. I just slowly transform the data into what I want. Uncomment one choice at a time if you want to see what each new CTE produces.

create table Junk (aDate Datetime, aLocation varchar(32)) insert into Junk values ('2000', 'Location1'), ('2001', 'Location1'), ('2002', 'Location1'), ('2004', 'Unknown'), ('2005', 'Unknown'), ('2006', 'Unknown'), ('2007', 'Location2'), ('2008', 'Location2'), ('2009', 'Location2'), ('2010', 'Location2'), ('2011', 'Location1'), ('2012', 'Location1'), ('2013', 'Location1'), ('2014', 'Location3') ;WITH StartsMiddlesAndEnds AS ( select aLocation, aDate, CASE(LAG(aLocation) OVER (ORDER BY aDate, aLocation)) WHEN aLocation THEN 0 ELSE 1 END [isStart], CASE(LEAD(aLocation) OVER (ORDER BY aDate, aLocation)) WHEN aLocation THEN 0 ELSE 1 END [isEnd] from Junk ) --select * from NumberedStartsMiddlesAndEnds ,NumberedStartsAndEnds AS --let get rid of the rows that are in the middle of consecutive date groups ( select aLocation, aDate, isStart, isEnd, ROW_NUMBER() OVER(ORDER BY aDate, aLocation) i FROM StartsMiddlesAndEnds WHERE NOT(isStart = 0 AND isEnd = 0) --it is a middle row ) --select * from NumberedStartsAndEnds ,CombinedStartAndEnds AS --now let put the start and end dates in the same row ( select rangeStart.aLocation, rangeStart.aDate [aStart], rangeEnd.aDate [aEnd] FROM NumberedStartsAndEnds rangeStart join NumberedStartsAndEnds rangeEnd ON rangeStart.aLocation = rangeEnd.aLocation WHERE rangeStart.i = rangeEnd.i - 1 --consecutive rows and rangeStart.isStart = 1 and rangeEnd.isEnd = 1 ) --select * from CombinedStartAndEnds ,OneDateIntervals AS --don't forget the cases where a single row is both a start and end ( select aLocation, aDate [aStart], aDate [aEnd] FROM NumberedStartsAndEnds WHERE isStart = 1 and isEnd = 1 ) --select * from OneDateIntervals select aLocation, DATEPART(YEAR, aStart) [start], DATEPART(YEAR, aEnd) [end] from OneDateIntervals UNION select aLocation, DATEPART(YEAR, aStart) [start], DATEPART(YEAR, aEnd) [end] from CombinedStartAndEnds ORDER BY DATEPART(YEAR, aStart)

and he produces

 aLocation start end Location1 2000 2002 Unknown 2004 2006 Location2 2007 2010 Location1 2011 2013 Location3 2014 2014

You do not have 2012? Then you can get the same StartSMiddlesAndEnds CTE using ROW_NUMBER:

 ;WITH NumberedRows AS ( SELECT aLocation, aDate, ROW_NUMBER() OVER (ORDER BY aDate, aLocation) [i] FROM Junk ) ,StartsMiddlesAndEnds AS ( select currentRow.aLocation, currentRow.aDate, CASE upperRow.aLocation WHEN currentRow.aLocation THEN 0 ELSE 1 END [isStart], CASE lowerRow.aLocation WHEN currentRow.aLocation THEN 0 ELSE 1 END [isEnd] from NumberedRows currentRow left outer join NumberedRows upperRow on upperRow.i = currentRow.i-1 left outer join NumberedRows lowerRow on lowerRow.i = currentRow.i+1 ) --select * from StartsMiddlesAndEnds

Group by column with multiple groups of duplicate data

More articles: