How is using the LIMIT clause different from just getting N results?

In my application, I have a table of answers on a topic. The structure looks something like this:

CREATE TABLE responses ( id INT NOT NULL PRIMARY KEY, topic_id INT NOT NULL, author_id INT NOT NULL, response TEXT ); 

id is a field with automatic increment, topic_id and author_id are foreign keys, there are corresponding indexes, etc.

I always want to order the insertion time, usually the latest. In most cases, I will filter on topic_id . A typical query looks like this:

 SELECT * FROM responses WHERE topic_id=123 ORDER BY id DESC LIMIT 20; -- or, for pagination: SELECT * FROM responses WHERE topic_id=123 AND id < 456789 ORDER BY id DESC LIMIT 20; 

I want to implement a block list - each user has an author_id list that they do not want to see. I need to get the top 20 results, excluding those author_id and the answers that answer them.

Determining whether to exclude a string is quite difficult, and although it is possible to do so in the database (either in PL / SQL or by preprocessing), I want to keep the logic in the application. Therefore, I can do one of two things:

  • Forget the LIMIT clause, leaving the query unlimited. Eat rows until you count 20 valid results, then close the query.
  • Apply chunking - specify LIMIT 40 and hope that this is enough for 20 "good" results. If not, select the next 40, etc.

What is the practical difference between the two? Especially in terms of performance with many concurrent users.

I do this in PostgreSQL, but I'm ready to switch to another RDBMS. (I do not want to lose referential integrity, therefore I am not looking for NoSQL solutions). Perhaps I will need to configure some database parameters (for example, prefetch sizes) to make the most of unlimited query?

+4
source share
2 answers

I cannot talk about the specifics of Postgres, but it is possible that the query optimizer will use the LIMIT clause as part of the costing of various execution plans.

If you...

 select ... from ... where ... limit n 

then the optimizer knows that you are only going to get n lines, but for ...

 select ... from ... where ... 

the optimizer may suggest that you need a whole set of results that can be evaluated in a few thousand rows.

In particular, I expect RDBMS to support index-based access methods where LIMIT clauses are applied.

+3
source

Adding a block list to SQL is not difficult.

 SELECT * FROM responses WHERE topic_id=123 AND author_id NOT IN (SELECT author_id FROM blocked WHERE user_id = X) ORDER BY id DESC LIMIT 20; 

Just add NOT IN to the WHERE clause.

If you have any reason, you cannot do this, then the idea of ​​your piece is better. You do not want to have any restrictions, because then the database will return everything for the client or server to request it.

+1
source

Source: https://habr.com/ru/post/1444885/


All Articles