SQL query for aggregate data list frequency

Question

SQL query for aggregate data list frequency

I have a list of times in a database column (representing visits to a website).

I need to group them into intervals, and then get a table of the "cumulative frequency" of these dates.

For example, I could:

9:01 9:04 9:11 9:13 9:22 9:24 9:28

and I want to convert it to

 9:05 - 2 9:15 - 4 9:25 - 6 9:30 - 7

How can i do this? Can I even easily achieve this in SQL? I can easily do this in C #

+4

sql running-total

Simon Oct 3 '08 at 3:21

source share

6 answers

KristoferA · Answer 1 · 2008-10-03T03:56:28+0000

 create table accu_times (time_val datetime not null, constraint pk_accu_times primary key (time_val)); go insert into accu_times values ('9:01'); insert into accu_times values ('9:05'); insert into accu_times values ('9:11'); insert into accu_times values ('9:13'); insert into accu_times values ('9:22'); insert into accu_times values ('9:24'); insert into accu_times values ('9:28'); go select rounded_time, ( select count(*) from accu_times as at2 where at2.time_val <= rt.rounded_time ) as accu_count from ( select distinct dateadd(minute, round((datepart(minute, at.time_val) + 2)*2, -1)/2, dateadd(hour, datepart(hour, at.time_val), 0) ) as rounded_time from accu_times as at ) as rt go drop table accu_times

Results in:

 rounded_time accu_count ----------------------- ----------- 1900-01-01 09:05:00.000 2 1900-01-01 09:15:00.000 4 1900-01-01 09:25:00.000 6 1900-01-01 09:30:00.000 7

Ron savage · Answer 2 · 2008-10-03T03:45:50+0000

I must indicate that, based on the stated "intention" of the problem, to do an analysis of the traffic of visitors, I wrote this expression to summarize the calculations in homogeneous groups.

Otherwise (as in the “approximate” groups), the samples during the 5-minute interval will be compared with the counts in the 10-minute interval, which makes no sense.

You need to understand the "intent" of the user's requirement, not the literal "reading" of it. :-)

  create table #myDates ( myDate datetime ); go insert into #myDates values ('10/02/2008 09:01:23'); insert into #myDates values ('10/02/2008 09:03:23'); insert into #myDates values ('10/02/2008 09:05:23'); insert into #myDates values ('10/02/2008 09:07:23'); insert into #myDates values ('10/02/2008 09:11:23'); insert into #myDates values ('10/02/2008 09:14:23'); insert into #myDates values ('10/02/2008 09:19:23'); insert into #myDates values ('10/02/2008 09:21:23'); insert into #myDates values ('10/02/2008 09:21:23'); insert into #myDates values ('10/02/2008 09:21:23'); insert into #myDates values ('10/02/2008 09:21:23'); insert into #myDates values ('10/02/2008 09:21:23'); insert into #myDates values ('10/02/2008 09:26:23'); insert into #myDates values ('10/02/2008 09:27:23'); insert into #myDates values ('10/02/2008 09:29:23'); go declare @interval int; set @interval = 10; select convert(varchar(5), dateadd(minute,@interval - datepart(minute, myDate) % @interval, myDate), 108) timeGroup, count(*) from #myDates group by convert(varchar(5), dateadd(minute,@interval - datepart(minute, myDate) % @interval, myDate), 108) retuns: timeGroup --------- ----------- 09:10 4 09:20 3 09:30 8

dland · Answer 3 · 2008-10-03T12:13:32+0000

ooh, it's all too complicated.

Normalize to seconds, divide by bucket interval, trim and reinstall:

 select sec_to_time(floor(time_to_sec(d)/300)*300), count(*) from d group by sec_to_time(floor(time_to_sec(d)/300)*300)

Using Ron Savage data, I get

 +----------+----------+ | i | count(*) | +----------+----------+ | 09:00:00 | 1 | | 09:05:00 | 3 | | 09:10:00 | 1 | | 09:15:00 | 1 | | 09:20:00 | 6 | | 09:25:00 | 2 | | 09:30:00 | 1 | +----------+----------+

You can use ceil () or round () instead of floor ().

Update: for a table created with

 create table d ( d datetime );

ephemient · Answer 4 · 2008-10-03T03:44:26+0000

Create a periods table that describes the periods in which you want to divide the day up.

 SELECT periods.name, count(time) FROM periods, times WHERE period.start <= times.time AND times.time < period.end GROUP BY periods.name

Maniaczx · Answer 5 · 2008-10-03T03:54:07+0000

Create a table that shows the intervals you want to get, then join the two tables together.

For instance:

 time_entry.time_entry ----------------------- 2008-10-02 09:01:00.000 2008-10-02 09:04:00.000 2008-10-02 09:11:00.000 2008-10-02 09:13:00.000 2008-10-02 09:22:00.000 2008-10-02 09:24:00.000 2008-10-02 09:28:00.000 time_interval.time_end ----------------------- 2008-10-02 09:05:00.000 2008-10-02 09:15:00.000 2008-10-02 09:25:00.000 2008-10-02 09:30:00.000 SELECT ti.time_end, COUNT(*) AS 'interval_total' FROM time_interval ti INNER JOIN time_entry te ON te.time_entry < ti.time_end GROUP BY ti.time_end; time_end interval_total ----------------------- ------------- 2008-10-02 09:05:00.000 2 2008-10-02 09:15:00.000 4 2008-10-02 09:25:00.000 6 2008-10-02 09:30:00.000 7

If instead of getting the cumulative totals that you want to get within the range, then you add the time_start column to the time_interval table and change the query to

 SELECT ti.time_end, COUNT(*) AS 'interval_total' FROM time_interval ti INNER JOIN time_entry te ON te.time_entry >= ti.time_start AND te.time_entry < ti.time_end GROUP BY ti.time_end;

Cade roue · Answer 6 · 2008-10-03T03:47:10+0000

This uses quite a few SQL tricks (SQL Server 2005):

 CREATE TABLE [dbo].[stackoverflow_165571]( [visit] [datetime] NOT NULL ) ON [PRIMARY] GO ;WITH buckets AS ( SELECT dateadd(mi, (1 + datediff(mi, 0, visit - 1 - dateadd(dd, 0, datediff(dd, 0, visit))) / 5) * 5, 0) AS visit_bucket ,COUNT(*) AS visit_count FROM stackoverflow_165571 GROUP BY dateadd(mi, (1 + datediff(mi, 0, visit - 1 - dateadd(dd, 0, datediff(dd, 0, visit))) / 5) * 5, 0) ) SELECT LEFT(CONVERT(varchar, l.visit_bucket, 8), 5) + ' - ' + CONVERT(varchar, SUM(r.visit_count)) FROM buckets l LEFT JOIN buckets r ON r.visit_bucket <= l.visit_bucket GROUP BY l.visit_bucket ORDER BY l.visit_bucket

Note that it puts all the time on the same day and assumes that they are in the datetime column. The only thing it does not do, as your example, is to remove the leading zeros from the time representation.

SQL query for aggregate data list frequency

More articles: