Combining date date ranges in a SQL query

I am working on a query that should contain several rows of data depending on date ranges. These rows are duplicated in all data values ​​except for date ranges. For example, table data may look like

StudentID StartDate EndDate Field1 Field2 1 9/3/2007 10/20/2007 3 True 1 10/21/2007 6/12/2008 3 True 2 10/10/2007 3/20/2008 4 False 3 9/3/2007 11/3/2007 8 True 3 12/15/2007 6/12/2008 8 True 

The query result must have combined date ranges. The query must combine date ranges with a space of just one day. If the space is more than one day, then the lines should not be combined. Strings that do not have a split date range must pass unchanged. The result will look like

 StudentID StartDate EndDate Field1 Field2 1 9/3/2007 6/12/2008 3 True 2 10/10/2007 3/20/2008 4 False 3 9/3/2007 11/3/2007 8 True 3 12/15/2007 6/12/2008 8 True 

What would be the SELECT statement for this query?

+4
source share
10 answers

The following code should work. I made several assumptions as follows: there are no overlapping date ranges, in any of the fields there are no NULL values, and the start date for a given string is always less than the end date. If your data does not meet these criteria, you need to configure this method, but it should point you in the right direction.

You can use subqueries instead of views, but this can be cumbersome, so I used the views to make the code more understandable.

 CREATE VIEW dbo.StudentStartDates AS SELECT S.StudentID, S.StartDate, S.Field1, S.Field2 FROM dbo.Students S LEFT OUTER JOIN dbo.Students PREV ON PREV.StudentID = S.StudentID AND PREV.Field1 = S.Field1 AND PREV.Field2 = S.Field2 AND PREV.EndDate = DATEADD(dy, -1, S.StartDate) WHERE PREV.StudentID IS NULL GO CREATE VIEW dbo.StudentEndDates AS SELECT S.StudentID, S.EndDate, S.Field1, S.Field2 FROM dbo.Students S LEFT OUTER JOIN dbo.Students NEXT ON NEXT.StudentID = S.StudentID AND NEXT.Field1 = S.Field1 AND NEXT.Field2 = S.Field2 AND NEXT.StartDate = DATEADD(dy, 1, S.EndDate) WHERE NEXT.StudentID IS NULL GO SELECT SD.StudentID, SD.StartDate, ED.EndDate, SD.Field1, SD.Field2 FROM dbo.StudentStartDates SD INNER JOIN dbo.StudentEndDates ED ON ED.StudentID = SD.StudentID AND ED.Field1 = SD.Field1 AND ED.Field2 = SD.Field2 AND ED.EndDate > SD.StartDate AND NOT EXISTS (SELECT * FROM dbo.StudentEndDates ED2 WHERE ED2.StudentID = SD.StudentID AND ED2.Field1 = SD.Field1 AND ED2.Field2 = SD.Field2 AND ED2.EndDate < ED.EndDate AND ED2.EndDate > SD.StartDate) GO 
+2
source

In my experience, I need to combine ranges in post processing (not in SQL, but in my script). I'm not sure SQL can do this, especially because you cannot know exactly how many date ranges you need to bind in any particular case. If this can be done, I would also like to know.

EDIT: My answer assumes that you have several dates for each student, not just the beginning and end. If you have only one date range without spaces, then the other solutions mentioned are the way to go.

0
source
 SELECT StudentID, MIN(startdate) AS startdate, MAX(enddate), field1, field2 FROM tablex GROUP BY StudentID, field1, field2 

This will give you a result suggesting that there was no gap between the students.

0
source
 select StudentID, min(StartDate) StartDate, max(EndDate) EndDate, Field1, Field2 from table group by StudentID, Field1, Field2 
0
source

If the solutions min () / max () are not good enough (for example, if the dates do not touch and you want to group separate date ranges separately), I wonder if Oracle START WITH and CONNECT BY will use something, which, of course, will not work in every database.

0
source

EDIT: create another set of SQL for access. I tested all of this, but piecemeal, because I don’t know how to make multiple statements at a time in Access. Since I also do not know how to make comments, you can see the comments in the SQL version below.

 select studentid, min(startdate) as Starter, max(enddate) as Ender, field1, field2, max(startDate) - Min(endDate) as MaxGap into tempIDs from student group by studentid, field1, field2 ; delete from tempIDs where MaxGap > 1; UPDATE student INNER JOIN TempIDs ON Student.studentID = TempIDS.StudentID SET Student.StartDate = [TempIDs].[Starter], Student.EndDate = [TempIDs].[Ender]; 

I think this is the case in SQL Server - I did not do this in Access. I have not tested it in such unusual conditions as overlapping multiple records, etc., but this should get you started. It updates all duplicates, records with a small gap, leaving additional functions in the database. MSDN has a duplicate removal page: http://support.microsoft.com/kb/139444

 select studentid, min(startdate) as StartDate, max(enddate) as EndDate, field1, field2, datediff(dd, Min(endDate),max(startDate)) as MaxGap into #tempIDs from #student group by studentid, field1, field2 -- Update the relevant records. Keeps two copies of the massaged record -- - extra will need to be deleted. update #student set startdate = #TempIDS.startdate, enddate = #tempIDS.EndDate from #tempIDS where #student.studentid = #TempIDs.StudentID and MaxGap < 2 
0
source

Have you considered a mix without equi? It will look something like this:

 SELECT A.StudentID, A.StartDate, A.EndDate, A.Field1, A.Field2 FROM tblEnrollment AS A LEFT JOIN tblEnrollment AS B ON (A.StudentID = B.StudentID) AND (A.EndDate=B.StartDate-1) WHERE B.StudentID Is Null; 

What gives you is all records that do not have a corresponding record, which begins one day after the end date of the first record.

[Caveat: Beware that you can edit an unequal join in the Access query designer in SQL View. Switching to Design View may result in loss of connection (although if you switch to Access, you will find out about the problem, and if you return to SQL View immediately, you will not lose it)]

If you are then UNION, then with this:

 SELECT A.StudentID, A.StartDate, B.EndDate, A.Field1, A.Field2 FROM tblEnrollment AS A INNER JOIN tblEnrollment AS B ON (A.StudentID = B.StudentID) AND (A.EndDate= B.StartDate-1) 

He should give you what you need, assuming that at a time no more than two adjacent entries. I'm not sure how you would do this if you had more than two continuous recordings (this might include looking at StartDate-1 compared to EndDate), but it could lead to you starting in the right direction.

0
source

An alternative final request to the one that was provided by Tom H. in the accepted answer,

 SELECT SD.StudentID, SD.StartDate, MIN(ED.EndDate), SD.Field1, SD.Field2 FROM dbo.StudentStartDates SD INNER JOIN dbo.StudentEndDates ED ON ED.StudentID = SD.StudentID AND ED.Field1 = SD.Field1 AND ED.Field2 = SD.Field2 AND ED.EndDate > SD.StartDate GROUP BY SD.StudentID, SD.Field1, SD.Field2, SD.StartDate 

It also worked on all test data.

0
source

This is a classic problem in SQL (language), for example. (chapter 23, “Regions, runs, spaces, sequences, and series”) and his latest book, “Thinking in Sets” (chapter 15).

While it’s “fun” to correct data at runtime using a monster query, for me this is one of those situations that can be better resolved using a line and procedurally (personally, I would have done it using formulas in an Excel spreadsheet) .

It is important to create effective database constraints to prevent overlapping periods from overlapping. Again, writing ordered constraints in SQL is a classic: see Snodgrass ( http://www.cs.arizona.edu/people/rts/tdbbook.pdf ). Hint for MS Access users: you need to use CHECK restrictions.

0
source

Here is an example with test data using SQL Server 2005/2008 syntax.

 DECLARE @Data TABLE( CalendarDate datetime ) INSERT INTO @Data( CalendarDate ) -- range start SELECT '1 Jan 2010' UNION ALL SELECT '2 Jan 2010' UNION ALL SELECT '3 Jan 2010' -- range start UNION ALL SELECT '5 Jan 2010' -- range start UNION ALL SELECT '7 Jan 2010' UNION ALL SELECT '8 Jan 2010' UNION ALL SELECT '9 Jan 2010' UNION ALL SELECT '10 Jan 2010' SELECT DateGroup, Min( CalendarDate ) AS StartDate, Max( CalendarDate ) AS EndDate FROM( SELECT NextDay.CalendarDate, DateDiff( d, RangeStart.CalendarDate, NextDay.CalendarDate ) - ROW_NUMBER() OVER( ORDER BY NextDay.CalendarDate ) AS DateGroup FROM( SELECT Min( CalendarDate ) AS CalendarDate FROM @data ) AS RangeStart JOIN @data AS NextDay ON NextDay.CalendarDate >= RangeStart.CalendarDate ) A GROUP BY DateGroup 
0
source

Source: https://habr.com/ru/post/1277226/


All Articles