SQL query for index / primary key

In our online competition system, the standings table with whole columns (user_id, score) often changes. Both are indexed with a unique restriction. Two types of queries are required:

  • Given a score not in the table, return the position based on 1, which will occupy the score if it is inserted.
  • Given user_id in the table, return the position of the corresponding rating.

In both cases, the position in relation to the account increases: a new assessment, smaller than everything currently in the table, will have position 1.

Here's the tricky part: we probably can't afford to scan the table. The table can have up to 10 million records, and we need to process at least 40 queries per second.

How to do it in PostgreSQL?

I have a non-SQL solution in Berkeley DB that uses B-trees with support for logical records. It easily has good performance. But we would like to get rid of BDB by re-implementing it using a PostgreSQL query. I tried the obvious

 select 1+count(*) from standings where score < ? limit 1; 

This causes a table scan.

I expect the answer to be β€œin no way” because the BDB logical record set facility requires locking the entire B-tree for each edit. To get O (log N) performance, it relies on the number of sheets in each node. All these calculations on the way to the root should change with each editing; hence locking. Such a lock contradicts PostgreSQL design principles and, possibly, any multi-user database.

So, if the problem cannot be solved with PostgreSQL, confirming this is the next best result of this question.

+3
source share
1 answer

With a regular table, you cannot do in PostgreSQL 9.1. count() scans the table because indexes do not have visibility information. To verify that rows are not deleted, PostgreSQL should visit the table.

If the table is read-only (or rarely updated), you can add a row number to the table. Then a query like:

 SELECT rownumber+1 FROM standings WHERE score < ? ORDER BY score DESC LIMIT 1; 

With an index:

 CREATE INDEX standings_score_idx ON standings (score DESC); 

I would get the result almost instantly. However, this is not an option for a table with a write load for obvious reasons. So not for you.


The good news: one of the main new features of the upcoming PostgreSQL 9.2 is right for you: " Coverage Index " or " Index-Only Scan ". I have cited release notes for 9.2 here :

Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

This is often referred to as "index-only crawl" or "spanning indexes". This is possible for a bunch of pages with exceptionally fully visible tuples, as reported using a visibility map. The visibility map was made emergency as a necessary part of the implementation of this function.

This Robert Haas blog post provides more details on how this affects count performance. It helps to work even with the WHERE , as in your case.

+2
source

Source: https://habr.com/ru/post/953944/


All Articles