In my application, I have a table of answers on a topic. The structure looks something like this:
CREATE TABLE responses ( id INT NOT NULL PRIMARY KEY, topic_id INT NOT NULL, author_id INT NOT NULL, response TEXT );
id is a field with automatic increment, topic_id and author_id are foreign keys, there are corresponding indexes, etc.
I always want to order the insertion time, usually the latest. In most cases, I will filter on topic_id . A typical query looks like this:
SELECT * FROM responses WHERE topic_id=123 ORDER BY id DESC LIMIT 20;
I want to implement a block list - each user has an author_id list that they do not want to see. I need to get the top 20 results, excluding those author_id and the answers that answer them.
Determining whether to exclude a string is quite difficult, and although it is possible to do so in the database (either in PL / SQL or by preprocessing), I want to keep the logic in the application. Therefore, I can do one of two things:
- Forget the LIMIT clause, leaving the query unlimited. Eat rows until you count 20 valid results, then close the query.
- Apply chunking - specify LIMIT 40 and hope that this is enough for 20 "good" results. If not, select the next 40, etc.
What is the practical difference between the two? Especially in terms of performance with many concurrent users.
I do this in PostgreSQL, but I'm ready to switch to another RDBMS. (I do not want to lose referential integrity, therefore I am not looking for NoSQL solutions). Perhaps I will need to configure some database parameters (for example, prefetch sizes) to make the most of unlimited query?
source share