Using GROUP BY with FIRST_VALUE and LAST_VALUE

I work with some data that is currently stored at 1 minute intervals, which looks like this:

CREATE TABLE #MinuteData ( [Id] INT , [MinuteBar] DATETIME , [Open] NUMERIC(12, 6) , [High] NUMERIC(12, 6) , [Low] NUMERIC(12, 6) , [Close] NUMERIC(12, 6) ); INSERT INTO #MinuteData ( [Id], [MinuteBar], [Open], [High], [Low], [Close] ) VALUES ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ), ( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ), ( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ), ( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ), ( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ), ( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ), ( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ), ( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ), ( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ), ( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 ); SELECT * FROM #MinuteData; DROP TABLE #MinuteData; 

Values ​​track exchange rates, therefore, for each minute interval (bar) per minute, the Open and Close prices start per minute. The High and Low values ​​represent the highest and lowest speeds for each individual minute.

Desired Result

I want to reformat this data at 5 minute intervals to get the following result:

 MinuteBar Open Close Low High 2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100 2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710 

This takes the Open value from the first minute of the value 5, Close from the last minute of 5. The High and Low values ​​represent the highest High and lowest Low rates for the 5-minute period.

Current solution

I have a solution that does this (below), but it feels inelegant as it relies on id and self join values. In addition, I intend to run it on much larger datasets, so I tried to do it more efficiently, if possible:

 -- Create a column to allow grouping in 5 minute Intervals SELECT Id, MinuteBar, [Open], High, Low, [Close], DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval INTO #5MinuteData FROM #MinuteData ORDER BY minutebar -- Group by inteval and aggregate prior to self join SELECT Interval , MIN(MinuteBar) AS MinuteBar , MIN(Id) AS OpenId , MAX(Id) AS CloseId , MIN(Low) AS Low , MAX(High) AS High INTO #DataMinMax FROM #5MinuteData GROUP BY Interval; -- Self join to get the Open and Close values SELECT t1.Interval , t1.MinuteBar , tOpen.[Open] , tClose.[Close] , t1.Low , t1.High FROM #DataMinMax t1 INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId; DROP TABLE #DataMinMax DROP TABLE #5MinuteData 

Attempt Rework

Instead of the above requests, I considered using FIRST_VALUE and LAST_VALUE , because it seems to me that I am behind it, but I can not get it to work with the group that I am doing. There may be a better solution than what I'm trying to do, so I am open to suggestions. I am currently trying to do this:

 SELECT MIN(MinuteBar) MinuteBar5 , FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening, MAX(High) AS High , MIN(Low) AS Low , LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing , DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval FROM #MinuteData GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 

This gives me the following error related to FIRST_VALUE and LAST_VALUE as the query runs if I delete these lines:

The column # MinuteData.MinuteBar is not valid in the select list because it is not contained in the aggregate function or in the GROUP BY clause.

+5
source share
3 answers
 SELECT MIN(MinuteBar) AS MinuteBar5, Opening, MAX(High) AS High, MIN(Low) AS Low, Closing, Interval FROM ( SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening, FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing, DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval, * FROM #MinuteData ) AS T GROUP BY Interval, Opening, Closing 

A solution close to your current one. There are two places where you were mistaken.

  • FIRST_VALUE and LAST_VALUE are analytic functions that work with a window or section, and not with a group. You can run only a subquery and see its result.
  • LAST_VALUE is the last value of the current window that is not specified in your request, and the default window is the lines from the first line of the current section to the current line . You can use FIRST_VALUE with the feed order or specify a window

     LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing, 
+2
source

Here is one way to do this without temporary tables:

 ;WITH CTEInterval AS ( -- This replaces your first temporary table (#5MinuteData) SELECT [Id], [MinuteBar], [Open], [High], [Low], [Close], DATEPART(MINUTE, MinuteBar)/5 AS Interval FROM #MinuteData ), CTEOpenClose as ( -- this is instead of your second temporary table (#DataMinMax) SELECT [Id], [MinuteBar], FIRST_VALUE([Open]) OVER (PARTITION BY Interval ORDER BY MinuteBar) As [Open], [High], [Low], FIRST_VALUE([Close]) OVER (PARTITION BY Interval ORDER BY MinuteBar DESC) As [Close], Interval FROM CTEInterval ) -- This is the final select SELECT MIN([MinuteBar]) as [MinuteBar], AVG([Open]) as [Open], -- All values of [Open] in the same interval are the same... AVG([Close]) as [Close], -- All values of [Close] in the same interval are the same... MIN([Low]) as [Low], MAX([High]) as [High] FROM CTEOpenClose GROUP BY Interval 

Results:

 MinuteBar Open Close Low High 2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100 2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710 
+1
source

Demo here

 ;with cte as (--this can be your permanent table with intervals ,rather than generating on fly select cast('2015-01-01 17:00:00.000' as datetime) as interval,dateadd(mi,5,'2015-01-01 17:00:00.000') as nxtinterval union all select dateadd(mi,5,interval),dateadd(mi,5,nxtinterval) from cte where interval<='2015-01-01 17:45:00.000' ) ,finalcte as (select minutebar, low,high, dense_rank() over (order by interval,nxtinterval) as grpd, last_value([close]) over ( partition by interval,nxtinterval order by interval,nxtinterval) as [close], first_value([open]) over (partition by interval,nxtinterval order by interval,nxtinterval) as [open] from cte c join #minutedata m on m.minutebar between interval and nxtinterval ) select min(minutebar) as minutebar, min(low) as 'low', max(high) as 'High', max([open]) as 'open', max([close]) as 'close' from finalcte group by grpd 
+1
source

Source: https://habr.com/ru/post/1263289/


All Articles