Providing a quick `select count (*)` function in a web application

I am updating the application to support the national engineering competition by moving it from the local server to the cloud.

To tell the team where they are currently, the request has the form

select 1 + count(*) from team where where score < ? 
Team scores are changing very dynamically. There can be up to 2 million commands, and I need to process at least 10 such requests per second.

The original gets the necessary performance (in fact, it has already been done since 1999 harware) using a separate Berkeley DB from the command / evaluation records. Berkeley DB has a "record number" function that provides exactly the right functionality and very fast.

Heroku does not seem to be able to support Berkeley DB. PostgreSQL, their standard DB, does select count(*) with a full table or index scan, which is too slow.

Any ideas on how to proceed? I am not attached to Heroku, but I have to move on to some kind of cloud solution.

+4
source share
4 answers

Use redis to save your command data in a sorted set . Then the ZRANK function will return the account you need. Redis is very fast, and the ZRANK function is O (log N). It is implemented using skip lists.

+2
source

Create a rank table and update it as often as possible. Include a category (open or official) and evaluate so that you do not have to join the team table at the time of the request:

 create table "rank" ( team integer primary key, category integer, score integer, rank_consolidated integer, rank_category integer ); begin; truncate table "rank" ; insert into "rank" (team, category, score, rank_consolidated, rank_category) select team, category, score, rank() over(order by score desc) rank_consolidated, rank() over(partition by category order by score desc) rank_category from team ; commit ; select * from "rank" where team = 11; 

Regarding the exact ranking, look at the window functions

+2
source

Putting an index on an account should avoid a full table scan.

0
source

If it is read to a much greater degree than it is written, and it should always be relevant, then this is ideal work for a pivot table that supports a trigger (like a materialized view).

You have a trigger in the team table that AFTER EACH INSERT OR UPDATE OR DELETE FOR EACH ROW performs a trigger function that updates the team_summary table team_summary for this command with a new rating.

The team_summary table can be accessed with a simple direct equality search index, so it will be insanely fast. Since Pg supports simultaneous readers and writers, the team_summary table will remain responsive, even if it is updated very frequently. The only thing you need to do to get the best results is to set FILLFACTOR to something like 50 in the team_summary table team_summary that HOT can work well, and make sure autovacuum is configured to run quite often to distribute the vacuum load. I / O cure.

Writing a trigger should be pretty trivial. You just need to be careful to write a concurrency-safe trigger that will not break if you have parallel updates of the same command by multiple parallel connections. Sort of:

 UPDATE team_summary SET score = score + 1 WHERE team_id = NEW.team_id; 

should be good at isolating SERIALIZABLE and READ COMMITTED . See Concurrency control . The only hard bit is that you must always insert a new row in team_summary before inserting the first row for a new team in team so that your trigger does not handle the surprisingly difficult case where the team_summary row may not yet exist in the team table. Getting upsert / merge for this is quite difficult.

If the write speed is also very high and you can only get out with updated results every few seconds / minutes, use the Clodoaldo approach.

0
source

Source: https://habr.com/ru/post/1438266/


All Articles