Problem Statement
I have a table "event_statistics" with the following definition:
CREATE TABLE public.event_statistics (
id int4 NOT NULL DEFAULT nextval('event_statistics_id_seq'::regclass),
client_id int4 NULL,
session_id int4 NULL,
action_name text NULL,
value text NULL,
product_id int8 NULL,
product_options jsonb NOT NULL DEFAULT '{}'::jsonb,
url text NULL,
url_options jsonb NOT NULL DEFAULT '{}'::jsonb,
visit int4 NULL DEFAULT 0,
date_update timestamptz NULL,
CONSTRAINT event_statistics_pkey PRIMARY KEY (id),
CONSTRAINT event_statistics_client_id_session_id_sessions_client_id_id_for
FOREIGN KEY
(client_id,session_id) REFERENCES <?>() ON DELETE CASCADE ON UPDATE CASCADE
)
WITH (
OIDS=FALSE
) ;
CREATE INDEX regdate ON public.event_statistics (date_update
timestamptz_ops) ;
And the clients table:
CREATE TABLE public.clients (
id int4 NOT NULL DEFAULT nextval('clients_id_seq'::regclass),
client_name text NULL,
client_hash text NULL,
CONSTRAINT clients_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
) ;
CREATE INDEX clients_client_name_idx ON public.clients (client_name
text_ops) ;
I need to get the number of events in the "event_statistics" table of each type "action_name" for a specific grouping of the ranges "date_update" with "action_name" and a specific time step and all this for a specific client.
The goal is to provide statistics for all relevant events for each client on their information panel on our website with the ability to select report dates and depending on the interval of the time interval in the chart should be different:
- current day - counting for each hour;
- 1+ day and <= 1 month - calculation for each day;
- 1+ <= 6 - ;
- 6+ - .
:
SELECT t.date, A.actionName, count(E.id)
FROM generate_series(current_date - interval '1 week',now(),interval '1
day') as t(date) cross join
(values
('page_open'),
('product_add'),
('product_buy'),
('product_event'),
('product_favourite'),
('product_open'),
('product_share'),
('session_start')) as A(actionName) left join
(select action_name,date_trunc('day',e.date_update) as dateTime, e.id
from event_statistics as e
where e.client_id = (select id from clients as c where c.client_name =
'client name') and
(date_update between (current_date - interval '1 week') and now())) E
on t.date = E.dateTime and A.actionName = E.action_name
group by A.actionName,t.date
order by A.actionName,t.date;
, 10 . , , , , ( , , , ).
:
GroupAggregate (cost=171937.16..188106.84 rows=1600 width=44)
Group Key: "*VALUES*".column1, t.date
InitPlan 1 (returns $0)
-> Seq Scan on clients c (cost=0.00..1.07 rows=1 width=4)
Filter: (client_name = 'client name'::text)
-> Merge Left Join (cost=171936.08..183784.31 rows=574060 width=44)
Merge Cond: (("*VALUES*".column1 = e.action_name) AND (t.date =(date_trunc('day'::text, e.date_update))))
-> Sort (cost=628.77..648.77 rows=8000 width=40)
Sort Key: "*VALUES*".column1, t.date
-> Nested Loop (cost=0.02..110.14 rows=8000 width=40)
-> Function Scan on generate_series t (cost=0.02..10.02 rows=1000 width=8)
-> Materialize (cost=0.00..0.14 rows=8 width=32)
-> Values Scan on "*VALUES*" (cost=0.00..0.10 rows=8 width=32)
-> Materialize (cost=171307.32..171881.38 rows=114812 width=24)
-> Sort (cost=171307.32..171594.35 rows=114812 width=24)
Sort Key: e.action_name, (date_trunc('day'::text, e.date_update))
-> Index Scan using regdate on event_statistics e (cost=0.57..159302.49 rows=114812 width=24)
Index Cond: ((date_update > (('now'::cstring)::date - '7 days'::interval)) AND (date_update <= now()))
Filter: (client_id = $0)
"event_statistics" 50 , , .
, .
, stackoverflow , , :
- client_id
- , , ( , ) -)
- ( Intel Xeon E7-4850 2.00GHz, 6 -, )
- OLAP, Postgres-XL
- ?
btree event_statistics (client_id asc, action_name asc, date_update asc, id). , , .
?
Update
explain (analyze, verbose):
GroupAggregate (cost=860934.44..969228.46 rows=1600 width=44) (actual time=52388.678..54671.187 rows=64 loops=1)
Output: t.date, "*VALUES*".column1, count(e.id)
Group Key: "*VALUES*".column1, t.date
InitPlan 1 (returns $0)
-> Seq Scan on public.clients c (cost=0.00..1.07 rows=1 width=4) (actual time=0.058..0.059 rows=1 loops=1)
Output: c.id
Filter: (c.client_name = 'client name'::text)
Rows Removed by Filter: 5
-> Merge Left Join (cost=860933.36..940229.77 rows=3864215 width=44) (actual time=52388.649..54388.698 rows=799737 loops=1)
Output: t.date, "*VALUES*".column1, e.id
Merge Cond: (("*VALUES*".column1 = e.action_name) AND (t.date = (date_trunc('day'::text, e.date_update))))
-> Sort (cost=628.77..648.77 rows=8000 width=40) (actual time=0.190..0.244 rows=64 loops=1)
Output: t.date, "*VALUES*".column1
Sort Key: "*VALUES*".column1, t.date
Sort Method: quicksort Memory: 30kB
-> Nested Loop (cost=0.02..110.14 rows=8000 width=40) (actual time=0.059..0.080 rows=64 loops=1)
Output: t.date, "*VALUES*".column1
-> Function Scan on pg_catalog.generate_series t (cost=0.02..10.02 rows=1000 width=8) (actual time=0.043..0.043 rows=8 loops=1)
Output: t.date
Function Call: generate_series(((('now'::cstring)::date - '7 days'::interval))::timestamp with time zone, now(), '1 day'::interval)
-> Materialize (cost=0.00..0.14 rows=8 width=32) (actual time=0.002..0.003 rows=8 loops=8)
Output: "*VALUES*".column1
-> Values Scan on "*VALUES*" (cost=0.00..0.10 rows=8 width=32) (actual time=0.004..0.005 rows=8 loops=1)
Output: "*VALUES*".column1
-> Materialize (cost=860304.60..864168.81 rows=772843 width=24) (actual time=52388.441..54053.748 rows=799720 loops=1)
Output: e.id, e.date_update, e.action_name, (date_trunc('day'::text, e.date_update))
-> Sort (cost=860304.60..862236.70 rows=772843 width=24) (actual time=52388.432..53703.531 rows=799720 loops=1)
Output: e.id, e.date_update, e.action_name, (date_trunc('day'::text, e.date_update))
Sort Key: e.action_name, (date_trunc('day'::text, e.date_update))
Sort Method: external merge Disk: 39080kB
-> Index Scan using regdate on public.event_statistics e (cost=0.57..753018.26 rows=772843 width=24) (actual time=31.423..44284.363 rows=799720 loops=1)
Output: e.id, e.date_update, e.action_name, date_trunc('day'::text, e.date_update)
Index Cond: ((e.date_update >= (('now'::cstring)::date - '7 days'::interval)) AND (e.date_update <= now()))
Filter: (e.client_id = $0)
Rows Removed by Filter: 2983424
Planning time: 7.278 ms
Execution time: 54708.041 ms