I collect data from several API sources through Python and add them to 2 tables in Postgres.
Then I use this data to create reports, merge and group / filter data. Every day I add thousands of lines.
Cost, revenue and sales are always cumulative, which means that each data point is located from t1 for this product, and t2 is the data recovery time.
The last data presentation will include all previous data up to t1. t1, t2 - timestamp without a time zone in Postgres. I am currently using Postgres 10.
Example:
id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7
6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3
Cost and revenue from two tables, and I join them on vendor_id, product_id and t2.
, "" , , , ?
?
, , , , , .
with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...
!
JR