Good question, I have run into this problem for a long time.
Why is this happening?
You should look at the number of user='abcd'
values ββin stats as follows:
SELECT attname, null_frac, ag_width, n_distinct, most_common_vals, most_common_freqs, histogram_bounds FROM pg_stats WHERE table_name='T';
My guess is that this value is quite common, and you will find it in the output of most_common_vals
. Choosing the same element from most_common_freqs
, you get the coefficient for the value, multiply it by the total number of lines (can be obtained from pg_class
) to get the number of lines that are estimated to have the value 'abcd'
.
The planner assumes that all values ββare linearly distributed. In fact, everything is different. In addition, there is currently no correlated statistics ( although some work in this direction ).
So, let's say user='abcd'
value of 0.001
(per question) in the corresponding most_common_freqs
. This means that the value will take place every 1000 rows (subject to linear distribution). It seems that if we scan the table in any way, we hit our user='abcd'
in 1000 rows. Sounds like it should be fast! The scheduler thinks the same and selects the index in the timestamp
column.
But this is not so. If we assume that your table T
contains user activity logs, and user='abcd'
been on vacation for the past 3 weeks, then this means that we will need to read quite a few rows from the timestamp
index (data for 3 weeks), before we actually hit the line we need. Well, you, as a DBA, know this, but the scheduler assumes a linear distribution.
So how to fix it?
You will need to trick the scheduler to use what you need, because you have more knowledge about your data.
Use OFFSET 0
subquery trick :
SELECT * FROM ( SELECT * FROM T WHERE user='abcd' OFFSET 0 ) ORDER BY timestamp LIMIT 1;
This trick protects the request from inlining, so the inside is executed on it.
Use CTE
(named subquery):
WITH s AS ( SELECT * FROM T WHERE user='abcd' ) SELECT * FROM s ORDER BY timestamp LIMIT 1;
In the documentation:
A useful property of WITH queries is that they are evaluated only once per parent request, even if they are referenced more than once with the parent query or Sibling WITH queries.
Use count(*)
for aggregated queries:
SELECT min(session_id), count(*)
It really is not applicable, but I would like to mention it.
And please consider upgrading to 9.3.
PS More about estiamtes lines in documents, of course .