I have unevenly distributed data (response date) for several years (2003-2008). I want to request data for a given set of start and end dates, grouping data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 ( http://www.postgresql.org/docs/8.3/static/ functions-datetime.html # FUNCTIONS-DATETIME-TRUNC ).
The problem is that some of the queries produce results that are continuous for the required period, like this one:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) from some_table where category_id=1 and entity_id = 77 and entity2_id = 115 and date <= '2008-12-06' and date >= '2007-12-01' group by date_trunc('month',date) order by date_trunc('month',date); to_char | count
but some of them skip some intervals because there is no data, since this:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) from some_table where category_id=1 and entity_id = 75 and entity2_id = 115 and date <= '2008-12-06' and date >= '2007-12-01' group by date_trunc('month',date) order by date_trunc('month',date); to_char | count
where is the desired result set:
to_char | count ------------+------- 2007-12-01 | 2 2008-01-01 | 2 2008-02-01 | 0 2008-03-01 | 1 2008-04-01 | 2 2008-05-01 | 0 2008-06-01 | 1 2008-07-01 | 0 2008-08-01 | 3 2008-09-01 | 0 2008-10-01 | 2 2008-11-01 | 0 (12 rows)
Number 0 for missing entries.
I saw earlier discussions about stack overflow, but they do not solve my problem, because my grouping period is one of (day, week, month, quarter, year) and determined the execution time of the application. So an approach like left joining with a calendar table or sequence table won't help, I think.
My current solution to this is to fill these gaps in Python (in the Turbogears application) with a calendar module.
Is there a better way to do this.