I'm currently working on a complex sorting problem in Postgres 9.2. You can find the Source code used in this Question (simplified) here: http://sqlfiddle.com/#!12/9857e/11
I have a huge table (β 20Mio rows) containing different columns of different types.
CREATE TABLE data_table ( id bigserial PRIMARY KEY, column_a character(1), column_b integer
Suppose I want to sort this table over 2 columns ( ASC ). But I donβt want to do this simply with Order By, because later I may need to insert lines into the sorted output, and the user probably only wants to see 100 lines at once (sorted output).
To achieve these goals, I do the following:
CREATE TABLE meta_table ( id bigserial PRIMARY KEY, id_data bigint NOT NULL
And then I copy the data_table identifier to meta_table:
INSERT INTO meta_table(id_data) (SELECT id FROM data_table);
Later I can add extra rows to the table with a similar simple insert.
To get lines 900000 - 900099 ( 100 lines ), I can now use:
SELECT get_column_a(id_data), get_column_b(id_data), id_data FROM meta_table ORDER BY 1,2,3 OFFSET 900000 LIMIT 100;
(With an extra INNER JOIN on data_table if I want all the data.)
Resulting Plan:
Limit (cost=498956.59..499012.03 rows=100 width=8) -> Index Only Scan using meta_sort_index on meta_table (cost=0.00..554396.21 rows=1000000 width=8)
This is a pretty effective plan (only for scanning only in Postgres 9.2). But what if I want to get lines 20'000'000 - 20'000'099 ( 100 lines )? The same plan, much longer execution time. To improve offset performance ( Improve OFFSET performance in PostgreSQL ), I can do the following (suppose I save every 100'000 rows in a different table).
SELECT get_column_a(id_data), get_column_b(id_data), id_data FROM meta_table WHERE (get_column_a(id_data), get_column_b(id_data), id_data ) >= (get_column_a(587857), get_column_b(587857), 587857 ) ORDER BY 1,2,3 LIMIT 100;
It works a lot faster. Resulting Plan:
Limit (cost=0.51..61.13 rows=100 width=8) -> Index Only Scan using meta_sort_index on meta_table (cost=0.51..193379.65 rows=318954 width=8) Index Cond: (ROW((get_column_a(id_data)), (get_column_b(id_data)), id_data) >= ROW('Z'::bpchar, 27857, 587857))
So far, everything is working fine, and postgres is doing great!
Suppose I want to change the order of the second column to DESC .
But then I would have to change my WHERE clause because the Operator> Operator compares both ASC columns. The same request as above (ASC Ordering) can also be written as:
SELECT get_column_a(id_data), get_column_b(id_data), id_data FROM meta_table WHERE (get_column_a(id_data) > get_column_a(587857)) OR (get_column_a(id_data) = get_column_a(587857) AND ((get_column_b(id_data) > get_column_b(587857)) OR ( (get_column_b(id_data) = get_column_b(587857)) AND (id_data >= 587857)))) ORDER BY 1,2,3 LIMIT 100;
Now the plan changes and the request becomes slow:
Limit (cost=0.00..1095.94 rows=100 width=8) -> Index Only Scan using meta_sort_index on meta_table (cost=0.00..1117877.41 rows=102002 width=8) Filter: (((get_column_a(id_data)) > 'Z'::bpchar) OR (((get_column_a(id_data)) = 'Z'::bpchar) AND (((get_column_b(id_data)) > 27857) OR (((get_column_b(id_data)) = 27857) AND (id_data >= 587857)))))
How can I use an effective old plan with DESC-Ordering?
Do you have any better ideas on how to solve the problem?
(I already tried to declare my own type with my own operator classes, but this is too slow)