Group records by consecutive dates when dates are not exactly consecutive

I have data containing dates. I am trying to group data by dates in a row, however the dates are not entirely consistent. Here is an example:

DateColumn | Value ------------------------+------- 2017-01-18 01:12:34.107 | 215426 <- batch no. 1 2017-01-18 01:12:34.113 | 215636 2017-01-18 01:12:34.623 | 123516 2017-01-18 01:12:34.633 | 289926 2017-01-18 04:58:42.660 | 259063 <- batch no. 2 2017-01-18 04:58:42.663 | 261830 2017-01-18 04:58:42.893 | 219835 2017-01-18 04:58:42.907 | 250165 2017-01-18 05:18:14.660 | 134253 <- batch no. 3 2017-01-18 05:18:14.663 | 134257 2017-01-18 05:18:14.667 | 134372 2017-01-18 05:18:15.040 | 181679 2017-01-18 05:18:15.043 | 226368 2017-01-18 05:18:15.043 | 227070 

The data is generated in batches, and each row within the lot takes several milliseconds to generate. I am trying to group the results as follows:

 Date1 | Date2 | Count ------------------------+-------------------------+------ 2017-01-18 01:12:34.107 | 2017-01-18 01:12:34.633 | 4 2017-01-18 04:58:42.660 | 2017-01-18 04:58:42.907 | 4 2017-01-18 05:18:14.660 | 2017-01-18 05:18:15.043 | 6 

It is safe to assume that if two consecutive lines are separated by more than 1 minute, they belong to another party.

I tried solutions with the ROW_NUMBER function, but they work with consecutive dates (the difference between the dates between the two lines is fixed). How can I achieve the desired result when the difference is fuzzy?


Please note that the batch can be much more than a minute. For example, a batch can consist of lines starting from 2017-01-01 00:00:00 and ending at 2017-01-01 00:05:00, consisting of ~ 3000 lines and each line, several tens or hundreds of milliseconds from each friend. What surely is that the parties are at least 1 minute apart.

+5
source share
3 answers

Try the following:

 select min(t.dateColumn) date1, max(t.dateColumn) date2, count(*) from ( select t.*, sum(val) over ( order by t.dateColumn ) grp from ( select t.*, case when datediff(ms, lag(t.dateColumn, 1, t.dateColumn) over ( order by t.dateColumn ), t.dateColumn) > 60000 then 1 else 0 end val from your_table t ) t ) t group by grp; 

It produces:

enter image description here

uses the analytic function lag() to mark the beginning of the next batch based on the difference of the datecolumn from the last, and then use the analytic sum() on it to create a group of batches and then group it to find the desired aggregates.

There may be some classification in the group due to rounding problems with DATETIME . From MSDN ,

datetime values ​​are rounded to increments of .000, .003, or .007 seconds, as shown in the following table.

enter image description here


Here is the same query rewritten using CTE:

 WITH cte1(DateColumn, ValueColumn) AS ( -- Insert your query that returns a datetime column and any other column SELECT SomeDate, SomeValue FROM SomeTable WHERE SomeColumn IS NOT NULL ), cte2 AS ( -- This query adds a column called "val" that contains -- 1 when current row date - previous row date > 1 minute -- 0 otherwise SELECT cte1.*, CASE WHEN DATEDIFF(MS, LAG(DateColumn, 1, DateColumn) OVER (ORDER BY DateColumn), DateColumn) > 60000 THEN 1 ELSE 0 END AS val FROM cte1 ), cte3 AS ( -- This query adds a column called "grp" that numbers -- the groups using running sum over the "val" column SELECT cte2.*, SUM(val) OVER (ORDER BY DateColumn) AS grp FROM cte2 ) SELECT MIN(DateColumn) Date1, MAX(DateColumn) Date2, COUNT(ValueColumn) [Count] FROM cte3 GROUP BY grp 
+8
source

Remove seconds and milliseconds from DateColumn and do grouping

  select min(DateColumn), max(DateColumn), count(*) from Yourtable group by DATEADD(MINUTE, DATEDIFF(MINUTE, 0, DateColumn), 0) 

Here are some questions about truncating seconds for datetime

Truncate seconds and milliseconds in SQL

The way to retrieve DateTime data without seconds

0
source

This does not work if you are comparing spaces between dates (60 seconds). But you can try this if you need to get records that belong to the same minute X.

 SELECT [Date1] = MIN([DateColumn]) ,[Date2] = MAX([DateColumn]) ,[Count] = COUNT([DateColumn]) FROM [my_table] GROUP BY DATEADD(mi, DATEDIFF(mi, 0, [DateColumn]), 0); 
-1
source

Source: https://habr.com/ru/post/1262997/


All Articles