Total Postgresql Amounts

I use count and group by to register the number of subscribers every day:

  SELECT created_at, COUNT(email) FROM subscriptions GROUP BY created at; 

Result:

 created_at count ----------------- 04-04-2011 100 05-04-2011 50 06-04-2011 50 07-04-2011 300 

I want to receive the total number of subscribers every day instead. How to get it?

 created_at count ----------------- 04-04-2011 100 05-04-2011 150 06-04-2011 200 07-04-2011 500 
+43
sql aggregate-functions postgresql
Apr 18 '11 at 4:14
source share
5 answers

With larger datasets, window functions are the most efficient way to perform such queries - tables will be scanned only once, and not once for each date, as self-join does. It also looks a lot easier. :) PostgreSQL 8.4 and above support window functions.

It looks like this:

 SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at; 

Here OVER creates the window; ORDER BY created_at means that it must sum the counts in the created_at order.




Edit: If you want to delete duplicate emails within one day, you can use sum(count(distinct email)) . Unfortunately, this will not remove duplicates that intersect different dates.

If you want to remove all duplicates, I think the easiest way is to use the subquery and DISTINCT ON . This will associate the letters with their earliest date (because I sort by created_at in ascending order, he will select the earliest of them):

 SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM ( SELECT DISTINCT ON (email) created_at, email FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at; 

If you create an index on (email, created_at) , this request should also not be too slow.




(If you want to test, here's how I created the sample dataset)

 create table subscriptions as select date '2000-04-04' + (i/10000)::int as created_at, 'foofoobar@foobar.com' || (i%700000)::text as email from generate_series(1,1000000) i; create index on subscriptions (email, created_at); 
+74
Apr 18 '11 at 9:12
source share

Using:

 SELECT a.created_at, (SELECT COUNT(b.email) FROM SUBSCRIPTIONS b WHERE b.created_at <= a.created_at) AS count FROM SUBSCRIPTIONS a 
+6
Apr 18 '11 at 4:19
source share
 SELECT s1.created_at, COUNT(s2.email) AS cumul_count FROM subscriptions s1 INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at GROUP BY s1.created_at 
+2
Apr 18 2018-11-11T00:
source share

I assume that you only need one line per day, and you want to show days without any subscription (suppose no one signs up for a specific date, do you want to show that date with the balance of the previous day?). If so, you can use the c function:

 with recursive serialdates(adate) as ( select cast('2011-04-04' as date) union all select adate + 1 from serialdates where adate < cast('2011-04-07' as date) ) select D.adate, ( select count(distinct email) from subscriptions where created_at between date_trunc('month', D.adate) and D.adate ) from serialdates D 
+2
Apr 18 '11 at 7:23
source share

The best way is to have a calendar table: calendar (date date, month int, quarter int, half int, weekly int, year int)

You can then join this table to compile a summary for the required field.

-3
Jul 18 '14 at 9:56
source share



All Articles