Group by column with multiple groups of duplicate data

I need to group some data based on dates in places, including identification when a date range does not take place. I am a little of the fact that I was able to create a list of ALL dates in the range and location.

  • date1 location1
  • date2 location1
  • date3 location1
  • date4 Unknown
  • date5 Unknown
  • date6 Unknown
  • date7 Location2
  • date8 Location2
  • date9 Location2
  • date10 Location2
  • date11 location1
  • date12 location1
  • date13 location1

using a regular group (showing min (date) and max (date), I get something like:

  • Location1, date1, date13
  • LOCATION2, date7, date10
  • Unknown, date4, date6

But I want this:

  • Location1, date1, date3
  • Unknown, date4, date6
  • LOCATION2, date7, date9
  • Location1, date11, date13

I also need to filter out short ranges of the Unknown, but the second.

I hope this makes sense, it seems like something that should be very simple.

+4
source share
1 answer

Take a look at the issue of islands and spaces and Itzik Ben-gan. There is a kit based on the method of obtaining the desired results.

I studied using ROW_NUMBER or RANK, but then I came across LAG and LEAD (introduced in SQL 2012), which are good. I have a solution below. This could definitely be simplified, but bearing in mind that several CTEs make my thinking process (as if mistaken, as it may be) easier to see. I just slowly transform the data into what I want. Uncomment one choice at a time if you want to see what each new CTE produces.

create table Junk (aDate Datetime, aLocation varchar(32)) insert into Junk values ('2000', 'Location1'), ('2001', 'Location1'), ('2002', 'Location1'), ('2004', 'Unknown'), ('2005', 'Unknown'), ('2006', 'Unknown'), ('2007', 'Location2'), ('2008', 'Location2'), ('2009', 'Location2'), ('2010', 'Location2'), ('2011', 'Location1'), ('2012', 'Location1'), ('2013', 'Location1'), ('2014', 'Location3') ;WITH StartsMiddlesAndEnds AS ( select aLocation, aDate, CASE(LAG(aLocation) OVER (ORDER BY aDate, aLocation)) WHEN aLocation THEN 0 ELSE 1 END [isStart], CASE(LEAD(aLocation) OVER (ORDER BY aDate, aLocation)) WHEN aLocation THEN 0 ELSE 1 END [isEnd] from Junk ) --select * from NumberedStartsMiddlesAndEnds ,NumberedStartsAndEnds AS --let get rid of the rows that are in the middle of consecutive date groups ( select aLocation, aDate, isStart, isEnd, ROW_NUMBER() OVER(ORDER BY aDate, aLocation) i FROM StartsMiddlesAndEnds WHERE NOT(isStart = 0 AND isEnd = 0) --it is a middle row ) --select * from NumberedStartsAndEnds ,CombinedStartAndEnds AS --now let put the start and end dates in the same row ( select rangeStart.aLocation, rangeStart.aDate [aStart], rangeEnd.aDate [aEnd] FROM NumberedStartsAndEnds rangeStart join NumberedStartsAndEnds rangeEnd ON rangeStart.aLocation = rangeEnd.aLocation WHERE rangeStart.i = rangeEnd.i - 1 --consecutive rows and rangeStart.isStart = 1 and rangeEnd.isEnd = 1 ) --select * from CombinedStartAndEnds ,OneDateIntervals AS --don't forget the cases where a single row is both a start and end ( select aLocation, aDate [aStart], aDate [aEnd] FROM NumberedStartsAndEnds WHERE isStart = 1 and isEnd = 1 ) --select * from OneDateIntervals select aLocation, DATEPART(YEAR, aStart) [start], DATEPART(YEAR, aEnd) [end] from OneDateIntervals UNION select aLocation, DATEPART(YEAR, aStart) [start], DATEPART(YEAR, aEnd) [end] from CombinedStartAndEnds ORDER BY DATEPART(YEAR, aStart) 

and he produces

 aLocation start end Location1 2000 2002 Unknown 2004 2006 Location2 2007 2010 Location1 2011 2013 Location3 2014 2014 

You do not have 2012? Then you can get the same StartSMiddlesAndEnds CTE using ROW_NUMBER:

 ;WITH NumberedRows AS ( SELECT aLocation, aDate, ROW_NUMBER() OVER (ORDER BY aDate, aLocation) [i] FROM Junk ) ,StartsMiddlesAndEnds AS ( select currentRow.aLocation, currentRow.aDate, CASE upperRow.aLocation WHEN currentRow.aLocation THEN 0 ELSE 1 END [isStart], CASE lowerRow.aLocation WHEN currentRow.aLocation THEN 0 ELSE 1 END [isEnd] from NumberedRows currentRow left outer join NumberedRows upperRow on upperRow.i = currentRow.i-1 left outer join NumberedRows lowerRow on lowerRow.i = currentRow.i+1 ) --select * from StartsMiddlesAndEnds 
+1
source

Source: https://habr.com/ru/post/1440345/


All Articles