special difficulty of this task: you cannot just select data points within your time range, but you must take into account the last data point before the time range and the earliest data point after the time range additionally. It depends on each row, and each data point may or may not exist exist. It requires a complex query and makes it difficult to use indexes.
You can use range types and (Postgres 9.2 + ) to simplify the calculations:
WITH input(a,b) AS (SELECT '2013-01-01'::date -- your time frame here , '2013-01-15'::date) -- inclusive borders SELECT store_id, product_id , sum(upper(days) - lower(days)) AS days_in_range , round(sum(value * (upper(days) - lower(days)))::numeric / (SELECT b-a+1 FROM input), 2) AS your_result , round(sum(value * (upper(days) - lower(days)))::numeric / sum(upper(days) - lower(days)), 2) AS my_result FROM ( SELECT store_id, product_id, value, s.day_range * x.day_range AS days FROM ( SELECT store_id, product_id, value , daterange (day, lead(day, 1, now()::date) OVER (PARTITION BY store_id, product_id ORDER BY day)) AS day_range FROM stock ) s JOIN ( SELECT daterange(a, b+1) AS day_range FROM input ) x ON s.day_range && x.day_range ) sub GROUP BY 1,2 ORDER BY 1,2;
Note. I use the column name day instead of date . I never use base type names as column names.
In the sub subquery, I select the day from the next line for each element using the lead() window function, using the built-in option to provide "today" by default when there is no next line.
With this, I daterange and match it with the input with the overlap operator && , calculating the resulting date range using the intersection operator * .
All ranges are located with an exclusive upper bound. Therefore, I add one day to the input range. Thus, we can simply subtract lower(range) from upper(range) to get the number of days.
I assume that "yesterday" is the last day with reliable data. Today can still change in real life. Therefore, I use today ( now()::date ) as an exclusive upper bound for open ranges.
I provide two results:
your_result is consistent with your displayed results.
You divide by the number of days in your date range unconditionally. For example, if the item is specified only for the last day, you get a very low (misleading!) "Medium".
my_result computes the same or higher numbers.
I am dividing by the actual number of days on which the item is indicated. For example, if the item is specified only for the last day, I return the specified value as an average.
To understand the difference, I added the number of days on which the item was specified: days_in_range
SQL Fiddle
Index and Performance
For such data, the old rows usually do not change. This would be a great example for a materialized view :
CREATE MATERIALIZED VIEW mv_stock AS SELECT store_id, product_id, value , daterange (day, lead(day, 1, now()::date) OVER (PARTITION BY store_id, product_id ORDER BY day)) AS day_range FROM stock;
Then you can add a GiST index that supports the corresponding && operator :
CREATE INDEX mv_stock_range_idx ON mv_stock USING gist (day_range);
Great test case
I conducted a more realistic test with 200 thousand lines. A query using MV was about 6 times faster, which in turn was ~ 10 times faster than a @Joop request. Performance is heavily dependent on data distribution. MV helps most with large tables and high recording rates. Also, if there are columns in the table that are not relevant to this query, the MV may be smaller. A question of cost and benefit.
I put all the solutions published so far (and adapted) in a large violin to play with:
SQL Fiddle with a great test case.
SQL Fiddle in just 40k rows - to avoid sqlfiddle.com timeout