With larger datasets, window functions are the most efficient way to perform such queries - tables will be scanned only once, and not once for each date, as self-join does. It also looks a lot easier. :) PostgreSQL 8.4 and above support window functions.
It looks like this:
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at;
Here OVER creates the window; ORDER BY created_at means that it must sum the counts in the created_at order.
Edit: If you want to delete duplicate emails within one day, you can use sum(count(distinct email)) . Unfortunately, this will not remove duplicates that intersect different dates.
If you want to remove all duplicates, I think the easiest way is to use the subquery and DISTINCT ON . This will associate the letters with their earliest date (because I sort by created_at in ascending order, he will select the earliest of them):
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM ( SELECT DISTINCT ON (email) created_at, email FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at;
If you create an index on (email, created_at) , this request should also not be too slow.
(If you want to test, here's how I created the sample dataset)
create table subscriptions as select date '2000-04-04' + (i/10000)::int as created_at, 'foofoobar@foobar.com' || (i%700000)::text as email from generate_series(1,1000000) i; create index on subscriptions (email, created_at);
intgr Apr 18 '11 at 9:12 2011-04-18 09:12
source share