How to efficiently search for the last record matching a condition in Rails and PostgreSQL?

Suppose you want to find the last record entered into the database (the highest identifier) corresponding to the line: Model.where(:name => 'Joe') . There are 100,000 entries. There are many coincidences (say, thousands).

What is the most efficient way to do this? Should PostgreSQL find all records or can it find the last one? Is this a particularly slow query?

Work in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.

+4
source share
3 answers

The important part here is to have a compliance index. You can try this little test setup:

Create x schema for testing:

 -- DROP SCHEMA x CASCADE; -- to wipe it all for a retest or when done. CREATE SCHEMA x; CREATE TABLE x.tbl(id serial, name text); 

Insert 10,000 random lines:

 INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000); 

Insert another 10,000 lines with duplicate names:

 INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20; 

Remove random 10% to make it more real:

 DELETE FROM x.tbl WHERE random() < 0.1; ANALYZE x.tbl; 

The request may look like this:

 SELECT * FROM x.tbl WHERE name = 'y17' ORDER BY id DESC LIMIT 1; 

-> Total run time: 5.535 ms

 CREATE INDEX tbl_name_idx on x.tbl(name); 

-> Total run time: 1.228 ms

 DROP INDEX x.tbl_name_idx; CREATE INDEX tbl_name_id_idx on x.tbl(name, id); 

-> Total run time: 0.053 ms

 DROP INDEX x.tbl_name_id_idx; CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC); 

-> Total run time: 0.048 ms

 DROP INDEX x.tbl_name_id_idx; CREATE INDEX tbl_name_idx on x.tbl(name); CLUSTER x.tbl using tbl_name_idx; 

-> Total run time: 1.144 ms

 DROP INDEX x.tbl_name_id_idx; CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC); CLUSTER x.tbl using tbl_name_id_idx; 

-> Total run time: 0.047 ms

Conclusion

With a fit index, the query runs more than 100 times faster .
The top executor is a multi-column index with the first filter column and the last sort column.
Matching the sort order in the index helps a bit in this case.

Clustering helps with a simple index, because many more columns need to be read from the table, and they can be found in neighboring blocks after clustering. In this case, this does not help with the index of multi-columns, because only one record needs to be extracted from the table.
Read more about multi-column indexes in the manual .

All these effects grow with the size of the table. 10,000 rows of two tiny columns is just a very small test case.

+7
source

You can put the query in Rails and ORM will write the correct SQL:

 Model.where(:name=>"Joe").order('created_at DESC').first 

This should not result in retrieving all model records or even scanning the table.

+4
source

This is probably the easiest:

 SELECT [columns] FROM [table] WHERE [criteria] ORDER BY [id column] DESC LIMIT 1 

Note. Indexing is important here. A huge database will slowly search no matter how you do it, unless you index the correct path.

-1
source

Source: https://habr.com/ru/post/1382340/


All Articles