Group by the end of the period instead of the start date

I am looking to aggregate data by the end date of a data set with some leading period rather than start. For example, I want to query a table and return the number of matching results 30 days of PRIOR before the end date of the date specified in the results. The source table will ONLY contain the sale date (timestamp). Example:

sales_timestamp ------------------ 2015-08-05 12:00:00 2015-08-06 13:00:00 2015-08-25 12:31:00 2015-08-26 01:02:00 2015-08-27 02:03:00 2015-08-29 04:23:00 2015-09-01 12:00:00 2015-09-02 12:00:00 2015-09-08 00:00:00 

An example of a query result as a result :

 date_period | count_of_sales -------------------------------- 2015-08-24 | 2 2015-08-31 | 6 2015-09-07 | 6 

in which the period_date 2015-09-07 will mean that the company sold 6 items in 30 days END ON 9/7/2015 (and starting on ~ 8/7/2015, if the true 30-day period).

I played with variations of the date_trunc() function, but it seemed that I could not force the truncation to be applied at the end date , and not grouped at the beginning.

This data will be posted on PostgreSQL 9.1.

+1
source share
1 answer

This query does everything you ask for:

 SELECT day::date AS date_period, count_of_sales FROM ( SELECT *, sum(ct) OVER (ORDER BY day ROWS 30 PRECEDING) AS count_of_sales FROM generate_series(date '2015-08-24' - 30 -- start 30 days earlier , date '2015-09-07' , interval '1 day') day LEFT JOIN ( SELECT date_trunc('day', sales_timestamp) AS day, count(*)::int AS ct FROM sales GROUP BY 1 ) s USING (day) ) sub JOIN generate_series(date '2015-08-24' , date '2015-09-07 ' , interval '1 week') day USING (day); 

SQL Fiddle

Explanation

  • Create a complete set of relevant days (1st generate_series() )
  • LEFT JOIN to aggregate daily counts. LEFT guarantees one line per day, which allows us to use window functions based on the number of lines.
  • Use sum() as a function of aggregating a window with a custom frame in 30 days. (Instead, you can use 29, it's unclear how you count.)

  • Add the result to the actual days that you want to receive. (Second generate_series() with one day a week).

Remember that the definition of "day" comes from the current time zone of your session, if you work with timestamptz . Results may vary in different time zones. It does not apply only to a timestamp that is independent of the current time zone. The basics:

Associated answer with an explanation for a window function with a custom frame definition:

+1
source

Source: https://habr.com/ru/post/919353/


All Articles