Grouping Date Range Strings

Question

Grouping Date Range Strings

I am using SQL Server 2008 and you need to create a query that displays strings that fall within a date range.

My table is as follows:

ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME

My rules:

If WH_OUT_DATETIME is turned on or within 24 hours after the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID

I would like to add another column to the results that identify the grouped value, if possible, as EP_ID .

eg.

 ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME ------ ------ -------------- --------------- 1 9 2014-10-12 00:00:00 2014-10-13 15:00:00 2 9 2014-10-14 14:00:00 2014-10-15 15:00:00 3 9 2014-10-16 14:00:00 2014-10-17 15:00:00 4 9 2014-11-20 00:00:00 2014-11-21 00:00:00 5 5 2014-10-17 00:00:00 2014-10-18 00:00:00

Will return strings with:

 ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME ------ ------ ----- ------------------- ------------------- ------------------- ------------------- 1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00 2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00 3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00 4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00 5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00

EP_OUT_DATETIME will always be the last date in the group. Hope this clarifies a bit. That way I can group by EP_ID and find EP_OUT_DATETIME and the start time for any ADM_ID / PID that fall inside.

Each must move on to the next, which means that if there is a WH_IN_DATETIME in the other row that follows the WH_OUT_DATETIME of the other for the same WH_PID, that this WH_OUT_DATETIME row becomes EP_OUT_DATETIME for the whole WH_PID in this EP_ID.

Hope this makes sense.

Thanks mr

+6

sql sql-server sql-server-2008

user4283270 Nov 23 '14 at 1:32

source share

5 answers

I would do this with exists in the correlated subquery:

 select t.*, (case when exists (select 1 from table t2 where t2.WH_P_ID = t.WH_P_ID and t2.ADM_ID = t.ADM_ID and t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME) ) then 1 else 0 end) as TimeFrameFlag from table t;

+3

Gordon linoff Nov 23 '14 at 2:56

source share

Try this query:

 ;WITH cte AS (SELECT t1.ADM_ID AS EP_ID,* FROM @yourtable t1 WHERE NOT EXISTS (SELECT 1 FROM @yourtable t2 WHERE t1.WH_PID = t2.WH_PID AND t1.ADM_ID <> t2.ADM_ID AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24) UNION ALL SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME FROM @yourtable t1 JOIN cte t2 ON t1.WH_PID = t2.WH_PID AND t1.ADM_ID <> t2.ADM_ID AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24), cte_result AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID FROM @yourtable t1 LEFT OUTER JOIN (SELECT DISTINCT ADM_ID, EP_ID FROM cte) t2 ON t1.ADM_ID = t2.ADM_ID) SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME], WH_IN_DATETIME, WH_OUT_DATETIME FROM cte_result ORDER BY ADM_ID

I took these things:

Those lines that follow your rule are group .
min(WH_IN_DATETIME) group will be displayed in the EP_IN_DATETIME column for all rows of this group. Similarly, max(WH_OUT_DATETIME) groups will be displayed in the EP_IN_DATETIME column, for which all rows belong to this group.
EP_ID will be assigned to the groups of each WH_PID separately.
One thing that is not justified by your question is that EP_OUT_DATETIME and WH_IN_DATETIME fourth row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Suppose this is a typo, and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000 .

Explanation:

The first CTE cte will return possible groups based on your rule. The second CTE cte_result assigns EP_ID groups. In the latter case, you can select min(WH_IN_DATETIME) and max(WH_OUT_DATETIME) in the wh_pid, ep_id .

sqlfiddle

+3

Deepak pawar Nov 26 '14 at 13:13

source share

Here is another alternative ... that might skip your results.

I agree with @NoDisplayName that an error appears on the output of ADM_ID 5, the dates 2 OUT should match - at least it seems logical to me. I cannot understand why you want the date to be ever displayed in the date value, but of course there could be a good reason. :)

In addition, the wording of your question makes it sound like this is only part of the problem, and you can continue this conclusion. I'm not sure what you are actually aiming for, but I split the query below into 2 CTEs and you can find your final information in the second CTE (it seems you want to group the data back together).

Here's the full structure and SQL Fiddle query

 -- The Cross Join ensures we always have a pair of first and last time pairs -- The left join matches all overlapping combinations, -- allowing the where clause to restrict to just the first and last -- These first/last pairs are then grouped in the first CTE -- Data is restricted in the second CTE -- The final select is then quite simple With GroupedData AS ( SELECT (Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp, t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID FROM yourtable t1 CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder LEFT OUTER JOIN yourtable t2 ON t1.WH_PID = t2.WH_PID AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0) OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1)) WHERE t2.WH_PID IS NULL ), RestrictedData AS ( SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME FROM GroupedData GROUP BY Grp, WH_PID ) SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME FROM RestrictedData INNER JOIN yourtable ON RestrictedData.WH_PID = yourtable.WH_PID AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME ORDER BY yourtable.ADM_ID

+2

Scott C Nov 30 '14 at 0:49

source share

A Left Outer Join and DateDiff Function should help you filter entries. Finally, use the Window Function to create GroupID's

 create table #test (ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME) INSERT #test VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'), (2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'), (3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'), (1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'), (2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00') SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id, a.WH_PID, a.WH_IN_DATETIME, b.WH_OUT_DATETIME FROM #test a LEFT JOIN #test b ON a.WH_PID = b.WH_PID AND a.ADM_ID <> b.ADM_ID where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24

OUTPUT:

 Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME -------- ------ ----------------------- ----------------------- 1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000 2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000 1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000

+1

P ரதீப் Nov 23 '14 at 2:28

source share

Solomon rutzky · Accepted Answer · 2014-11-27T07:01:02+0000

Since the question does not indicate that the solution is the "only" request ;-), here is another approach: using the "quirky update" function, which updates the variable, at the same time you update the column. Overcoming the complexity of this operation, I create a scratch table to hold the piece that is most difficult to calculate: EP_ID . Once this is done, it will be combined into a simple query and will provide a window for calculating the fields EP_IN_DATETIME and EP_OUT_DATETIME .

Steps:

Create Scratch Table
Put a scratch table with all the ADM_ID values - this allows us to do UPDATE since all rows already exist.
Refresh Scratch Table
Make the final, simple choice connecting the scratch table to the main table.

Test setup

 SET ANSI_NULLS ON; SET NOCOUNT ON; CREATE TABLE #Table ( ADM_ID INT NOT NULL PRIMARY KEY, WH_PID INT NOT NULL, WH_IN_DATETIME DATETIME NOT NULL, WH_OUT_DATETIME DATETIME NOT NULL ); INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00'); INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00'); INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00'); INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00'); INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');

Step 1. Create and populate a Scratch table

 CREATE TABLE #Scratch ( ADM_ID INT NOT NULL PRIMARY KEY, EP_ID INT NOT NULL -- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering ); INSERT INTO #Scratch (ADM_ID, EP_ID) SELECT ADM_ID, 0 FROM #Table;

An alternative scratch table structure to ensure the correct update order (since the “fancy update” uses the cluster index order as indicated at the bottom of this answer):

 CREATE TABLE #Scratch ( WH_PID INT NOT NULL, WH_IN_DATETIME DATETIME NOT NULL, ADM_ID INT NOT NULL, EP_ID INT NOT NULL ); INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID) SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0 FROM #Table; CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch] ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);

Step 2: update the Scratch Table using a local variable to track the previous value

 DECLARE @EP_ID INT; -- this is used in the UPDATE ;WITH cte AS ( SELECT TOP (100) PERCENT t1.*, t2.WH_OUT_DATETIME AS [PriorOut], t2.ADM_ID AS [PriorID], ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME) AS [RowNum] FROM #Table t1 LEFT JOIN #Table t2 ON t2.WH_PID = t1.WH_PID AND t2.ADM_ID <> t1.ADM_ID AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1) AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME ORDER BY t1.WH_PID, t1.WH_IN_DATETIME ) UPDATE sc SET @EP_ID = sc.EP_ID = CASE WHEN cte.RowNum = 1 THEN 1 WHEN cte.[PriorOut] IS NULL THEN (@EP_ID + 1) ELSE @EP_ID END FROM #Scratch sc INNER JOIN cte ON cte.ADM_ID = sc.ADM_ID

Step 3: Select “Attach Scratch Table”

 SELECT tab.ADM_ID, tab.WH_PID, sc.EP_ID, MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID) AS [EP_IN_DATETIME], MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID) AS [EP_OUT_DATETIME], tab.WH_IN_DATETIME, tab.WH_OUT_DATETIME FROM #Table tab INNER JOIN #Scratch sc ON sc.ADM_ID = tab.ADM_ID ORDER BY tab.ADM_ID;

Resources

MSDN Page for UPDATE
look for "@variable = column = expression"
Performance analysis of current results (not quite the same as here, but not too far)
This blog post mentions:
- PRO: this method is usually pretty fast
- CON: "The UPDATE order is controlled by the order of the clustered index." Such behavior may preclude the use of this method as the case may be. But in this particular case, if the WH_PID values are at least not naturally grouped by ordering the clustered index and ordered by WH_IN_DATETIME , then these two fields are simply added to the scratch table and PK (with the implied clustered index) becomes (WH_PID, WH_IN_DATETIME, ADM_ID) .

Grouping Date Range Strings

More articles: