It is necessary to "merge" the record blocks in the request into one, depending on the other columns of the request

I have the following db schema.

CREATE TABLE Twix.dbo.Sensors (Id int PRIMARY KEY NOT NULL IDENTITY(1,1), Location nvarchar(260) NOT NULL);

 CREATE TABLE Twix.dbo.Visitors (Id int PRIMARY KEY NOT NULL IDENTITY(1,1), Name nvarchar(260) NOT NULL); CREATE TABLE Twix.dbo.Visits(Id int PRIMARY KEY NOT NULL IDENTITY(1,1), SensorId int CONSTRAINT FK__Visits__Sensor__239E FOREIGN KEY REFERENCES Sensors(Id), VisitorId int CONSTRAINT FK__Visits__Visitors__4DCF FOREIGN KEY REFERENCES Visitors(Id), InTime datetime NOT NULL, OutTime datetime NOT NULL); 

code>

InTime-OutTime periods in the Visites table can be crossed. That is part of the real visits. For instance,

SELECT VisitorId, InTime, OutTime FROM Visits ORDER BY VisitorId, InTime

 1 2011-02-09 15:26:59.173 2011-02-09 15:29:22.097 1 2011-02-09 15:28:40.530 2011-02-09 15:29:52.737 1 2011-02-09 15:30:10.577 2011-02-09 16:17:58.967 1 2011-02-09 19:09:23.057 2011-02-09 19:10:57.493 1 2011-02-09 19:12:16.087 2011-02-09 19:13:27.493 1 2011-02-14 15:08:46.333 2011-02-14 15:26:42.433 2 2011-02-09 17:55:42.390 2011-02-09 18:52:03.780 2 2011-02-09 18:10:56.727 2011-02-09 18:11:57.493 2 2011-02-09 21:47:15.650 2011-02-09 21:48:38.783 2 2011-02-09 21:50:18.337 2011-02-09 21:55:26.777 3 2011-02-09 17:12:58.103 2011-02-09 19:51:59.697 3 2011-02-09 22:27:52.073 2011-02-09 23:03:24.753 3 2011-02-09 23:02:51.177 2011-02-10 09:51:14.890 3 2011-02-14 15:27:42.270 2011-02-14 15:42:31.107 3 2011-02-14 15:43:37.320 2011-02-14 18:45:26.163 4 2011-02-09 21:07:51.030 2011-02-09 21:51:02.880 4 2011-02-09 22:42:52.660 2011-02-09 23:21:13.830 4 2011-02-09 23:23:08.563 2011-02-09 23:35:12.847 4 2011-02-09 23:36:05.120 2011-02-09 23:59:02.813 4 2011-02-10 05:58:44.103 2011-02-10 05:59:55.867 4 2011-02-12 08:29:36.620 2011-02-12 09:51:18.510 4 2011-02-12 13:13:42.650 2011-02-12 14:06:01.473 5 2011-02-10 06:48:52.717 2011-02-10 07:37:04.870 5 2011-02-10 06:50:31.067 2011-02-10 06:52:20.877 5 2011-02-10 06:52:36.273 2011-02-10 06:53:36.523 5 2011-02-10 06:59:11.790 2011-02-10 07:00:34.867 5 2011-02-10 08:36:39.563 2011-02-10 08:46:14.760 5 2011-02-10 12:47:05.567 2011-02-10 12:48:05.860 5 2011-02-10 12:49:19.590 2011-02-10 13:09:27.880 5 2011-02-10 12:49:25.733 2011-02-10 12:59:59.883 5 2011-02-10 12:55:23.460 2011-02-10 12:56:23.507 

code>

I need to get "completed visits", i.e. Combine all visits for a particular whistor if visits overlap or the time difference between them is less than 10 minutes (i.e., "InTime of visit2" - "OutTime of visit1" <10 min).

In C #, it looks like this:

private IEnumerable Merge(IEnumerable visits, uint holeInterval)

  { var vlist = new LinkedList<Visit>(visits.OrderBy(o => o.InTime)); var result = new List<Visit>(); while (vlist.Count > 1) { Visit a = vlist.First.Value; vlist.RemoveFirst(); Visit b = vlist.First.Value; vlist.RemoveFirst(); var r = Visit.Merge(a, b, holeInterval); // Merges two visits if (r != null) { vlist.AddFirst(r); } else { result.Add(a); vlist.AddFirst(b); } } result.Add(vlist.First.Value); return result; } public IEnumerable<Visit> ListCompleteVisits() { var result = new List<Visit>(); var queryResult = from visits in this.repository.ListVisits() group visits by visits.Visitor.Id into vgroup select Merge(vgroup, this.holeInterval); foreach (var v in queryResult) { result.AddRange(v); } return result; } 

code>

I tried and get the following:

DECLARE @holeInterval int SET @holeInterval = 10

 SELECT t.RowNumber, t.VisitorId, t.InTime, t.OutTime, t.BInMinusAOut, (SELECT MIN(InTime) FROM Visits AS D WHERE VisitorId NOT IN (select VisitorId from Visits group by VisitorId having COUNT(*) = 1) AND D.InTime <= t.InTime and D.VisitorId = t.VisitorId AND t.RowNumber > /*here should be max rownumber greater than holeinterval*/ ) AS MinInTime FROM (SELECT ROW_NUMBER() OVER(ORDER BY VisitorId, InTime ASC) AS RowNumber, VisitorId, InTime, OutTime, DATEDIFF(MI,InTime, (SELECT MIN(InTime) FROM Visits AS B WHERE B.InTime > A.InTime and A.VisitorId = B. VisitorId )) AS BInMinusAIn, DATEDIFF(MI,OutTime, (SELECT MIN(InTime) FROM Visits AS B WHERE B.InTime > A.InTime and A.VisitorId = B. VisitorId ))) AS BInMinusAOut FROM Visits AS A WHERE VisitorId NOT IN (select VisitorId from Visits group by VisitorId having COUNT(*) = 1) ) t /*WHERE t.BInMinusAOut > @holeInterval OR t.BInMinusAOut IS NULL*/ ORDER BY VisitorId, InTime 1 1 2011-02-09 15:26:59.173 2011-02-09 15:28:22.097 0 2 1 2011-02-09 15:28:40.530 2011-02-09 15:29:52.737 1 3 1 2011-02-09 15:30:10.577 2011-02-09 16:17:58.967 9 4 1 2011-02-09 16:26:44.810 2011-02-09 16:51:46.423 20 5 1 2011-02-09 17:11:57.633 2011-02-09 17:13:20.680 2 6 1 2011-02-09 17:15:35.727 2011-02-09 17:18:48.493 -2 7 1 2011-02-09 17:16:12.230 2011-02-09 17:42:47.867 3 8 1 2011-02-09 17:45:43.793 2011-02-09 17:52:10.860 3 9 1 2011-02-09 17:55:31.127 2011-02-09 20:13:22.743 -109 10 1 2011-02-09 18:24:00.427 2011-02-09 18:32:12.033 2 11 1 2011-02-09 18:34:15.877 2011-02-09 18:37:19.770 2 12 1 2011-02-09 18:39:46.440 2011-02-09 18:48:16.800 2 13 1 2011-02-09 18:50:59.270 2011-02-09 20:03:47.550 -54 14 1 2011-02-09 19:09:23.057 2011-02-09 19:10:57.493 2 15 1 2011-02-09 19:12:16.087 2011-02-09 19:13:27.493 48 

code>

Now I need to "drain" the 1st-4th on 2011-02-09 15: 26: 59.173 - 2011-02-09 16: 51: 46.423 5-15 hours on 2011-02-09 17: 11: 57.633 - 2011-02-09 20: 13: 22.743 This means that I have to accept min InTime in the lines that is between the current and the last, where MinInTime> @holeInterval and max OutTime for these ranges too.

Ie result:

 1 2011-02-09 15:26:59.173 2011-02-09 16:51:46.423 1 2011-02-09 17:11:57.633 2011-02-09 20:13:22.743 

code>

Thank x.

+4
source share
3 answers

Do not try to do everything in one big choice. Write a stored procedure that creates a temporary table using the syntax Select INTO #temp . Then modify / add a temporary table. Then, finally, select temp from the table as the output. Performance will be good and you can move on.

+1
source

The query was written using common table expressions to make it clearer what is happening, but you can simply UNION ALL execute 2 queries exactly for the same result.

What happens in the request:

Overlapping visis

the table is combined by extending InTime and OutTime by 10 minutes. This will select all overlapping Vist from the dataset. The CASE statement is then used to select the lowest value from two overlapping time slots as InTime and the highest value for OutTime. Since we still get some results by doing this, the group group with min / max is executed to filter the minimum and maximum timestamps for the full set of results.

NonOverlappingVisits

Like overlapping visits, time intervals extend by 10 minutes on each side to find overlapping parts and match with WHERE NOT EXISTS .

 Declare @holeinterval int SET @holeinterval = 10 ;WITH OverlappingVisits (VisitorId, InTime, OutTime) AS (select v1.VisitorId , InTime = MIN(CASE WHEN v1.InTime < v2.InTime THEN v1.InTime ELSE v2.InTime END) , OutTime = MAX(CASE WHEN v1.OutTime < v2.OutTime THEN v2.OutTime ELSE v1.OutTime END) FROM Visits v1 INNER JOIN Visits v2 ON v1.VisitorId = v2.VisitorId AND (v1.InTime BETWEEN dateadd(minute, -@holeinterval , v2.InTime) AND dateadd(minute, @holeinterval, v2.OutTime) OR v1.OutTime BETWEEN dateadd(minute, @holeinterval, v2.InTime) AND dateadd(minute, @holeinterval, v2.OutTime)) AND v1.Id <> v2.Id GROUP BY v1.VisitorId), NonOverlappingVisits (VisitorId, InTime, OutTime) AS ( SELECT v1.VisitorId , v1.InTime , v1.OutTime FROM Visits v1 WHERE NOT EXISTS(SELECT * FROM Visits v2 WHERE v1.VisitorId = v2.VisitorId AND (v1.InTime BETWEEN dateadd(minute, -@holeinterval , v2.InTime) AND dateadd(minute, @holeinterval, v2.OutTime) OR v1.OutTime BETWEEN dateadd(minute, @holeinterval, v2.InTime) AND dateadd(minute, @holeinterval, v2.OutTime)) AND v1.Id <> v2.Id)) SELECT * FROM OverlappingVisits UNION ALL SELECT * FROM NonOverlappingVisits ORDER BY VisitorId, InTime, OutTime 
+1
source
 ;WITH Visits_tuned AS ( /* adding some helper columns */ SELECT VisitorId, InTime, OutTime, OutTimeDelayed = DATEADD(minute, 10, OutTime), rownum = ROW_NUMBER() OVER (PARTITION BY VisitorId ORDER BY InTime, OutTime), VisitId = ROW_NUMBER() OVER (ORDER BY VisitorId, InTime, OutTime) FROM Visits ), Visits_starts AS ( /* spotting the starting points of the 'merged' visits */ SELECT DISTINCT v1.VisitId, IsStart = 1 - CASE WHEN v2.VisitId IS NULL THEN 0 ELSE 1 END FROM Visits_tuned v1 LEFT JOIN Visits_tuned v2 ON v1.InTime BETWEEN v2.InTime AND v2.OutTimeDelayed AND v1.VisitorId = v2.VisitorId AND v1.rownum <> v2.rownum ), Visits_rec AS ( /* basically, selecting the original data, but with InTime values replaced by the starting InTimes */ SELECT VisitId, VisitorId, InTime, OutTime FROM Visits_tuned WHERE VisitId = 1 UNION ALL SELECT v.VisitId, v.VisitorId, IntTime = CASE WHEN v.VisitorId = r.VisitorId AND s.IsStart = 0 AND r.InTime < v.InTime THEN r.InTime ELSE v.InTime END, v.OutTime FROM Visits_tuned v INNER JOIN Visits_rec r ON v.VisitId = r.VisitId + 1 INNER JOIN Visits_starts s ON v.VisitId = s.VisitId ) /* main select; just grouping by visitor and in-time */ SELECT VisitorId, InTime, OutTime = MAX(OutTime) FROM Visits_rec GROUP BY VisitorId, InTime ORDER BY 1, 2 
+1
source

Source: https://habr.com/ru/post/1340021/


All Articles