Postgres - how to return rows with 0 counter for missing data?

I have unevenly distributed data (response date) for several years (2003-2008). I want to request data for a given set of start and end dates, grouping data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 ( http://www.postgresql.org/docs/8.3/static/ functions-datetime.html # FUNCTIONS-DATETIME-TRUNC ).

The problem is that some of the queries produce results that are continuous for the required period, like this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) from some_table where category_id=1 and entity_id = 77 and entity2_id = 115 and date <= '2008-12-06' and date >= '2007-12-01' group by date_trunc('month',date) order by date_trunc('month',date); to_char | count ------------+------- 2007-12-01 | 64 2008-01-01 | 31 2008-02-01 | 14 2008-03-01 | 21 2008-04-01 | 28 2008-05-01 | 44 2008-06-01 | 100 2008-07-01 | 72 2008-08-01 | 91 2008-09-01 | 92 2008-10-01 | 79 2008-11-01 | 65 (12 rows) 

but some of them skip some intervals because there is no data, since this:

 select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) from some_table where category_id=1 and entity_id = 75 and entity2_id = 115 and date <= '2008-12-06' and date >= '2007-12-01' group by date_trunc('month',date) order by date_trunc('month',date); to_char | count ------------+------- 2007-12-01 | 2 2008-01-01 | 2 2008-03-01 | 1 2008-04-01 | 2 2008-06-01 | 1 2008-08-01 | 3 2008-10-01 | 2 (7 rows) 

where is the desired result set:

  to_char | count ------------+------- 2007-12-01 | 2 2008-01-01 | 2 2008-02-01 | 0 2008-03-01 | 1 2008-04-01 | 2 2008-05-01 | 0 2008-06-01 | 1 2008-07-01 | 0 2008-08-01 | 3 2008-09-01 | 0 2008-10-01 | 2 2008-11-01 | 0 (12 rows) 

Number 0 for missing entries.

I saw earlier discussions about stack overflow, but they do not solve my problem, because my grouping period is one of (day, week, month, quarter, year) and determined the execution time of the application. So an approach like left joining with a calendar table or sequence table won't help, I think.

My current solution to this is to fill these gaps in Python (in the Turbogears application) with a calendar module.

Is there a better way to do this.

+11
python database left-join postgresql generate-series
Dec 6 '08 at 9:32 a.m.
source share
3 answers

You can create a list of all the first days of last year (say) with

 select distinct date_trunc('month', (current_date - offs)) as date from generate_series(0,365,28) as offs; date ------------------------ 2007-12-01 00:00:00+01 2008-01-01 00:00:00+01 2008-02-01 00:00:00+01 2008-03-01 00:00:00+01 2008-04-01 00:00:00+02 2008-05-01 00:00:00+02 2008-06-01 00:00:00+02 2008-07-01 00:00:00+02 2008-08-01 00:00:00+02 2008-09-01 00:00:00+02 2008-10-01 00:00:00+02 2008-11-01 00:00:00+01 2008-12-01 00:00:00+01 

Then you can join this series.

+16
Dec 6 '08 at 11:30
source share

This question is old. But since other users have selected it as the master for the new duplicate, I am adding the correct answer.

The right decision

 SELECT * FROM ( SELECT day::date FROM generate_series(timestamp '2007-12-01' , timestamp '2008-12-01' , interval '1 month') day ) d LEFT JOIN ( SELECT date_trunc('month', date_col)::date AS day , count(*) AS some_count FROM tbl WHERE date_col >= date '2007-12-01' AND date_col <= date '2008-12-06' -- AND ... more conditions GROUP BY 1 ) t USING (day) ORDER BY day; 
  • Use the LEFT JOIN course.

  • generate_series() can create a timestamp table on the fly and very fast.

  • As a rule, he quickly unites before joining. I recently presented a test case on sqlfiddle.com in this related answer:

    • PostgreSQL - Array Order
  • Move timestamp to date ( ::date ) for the base format. For more efficient use of to_char() .

  • GROUP BY 1 is a syntactic transcript for referencing the first output column. It may also be GROUP BY day , but this may conflict with an existing column with the same name. Or GROUP BY date_trunc('month', date_col)::date , but it's too long for me.

  • Works with available interval arguments for date_trunc() .

  • count() never creates NULL ( 0 without strings), but LEFT JOIN does.
    To return 0 instead of NULL to an external SELECT , use COALESCE(some_count, 0) AS some_count . Leadership.

  • For a more general solution or arbitrary time intervals, consider this close answer:

    • Best way to count records at arbitrary time intervals in Rails + Postgres
+16
Mar 31 '13 at 18:44
source share

You can create a temporary table at runtime and leave a join on it. This seems to make the most sense.

0
Dec 6 '08 at 10:54
source share



All Articles