Using the JDBC database setMaxRows

I am trying to write an independent database application with JDBC. Now I need a way to extract the top N records from some table. I saw that there is a setMaxRows method in JDBC, but I don’t feel comfortable using it because I am afraid that the database will pop out all the results and only the JDBC driver will reduce the result. If I need the 5 best results in a table with a billion rows, this will break my neck (the table has a useful index ).

Writing special SQL statements for each type of database is not very pleasant, but it will allow the database to do smart query scheduling and stop receiving more results than necessary.

Can one rely on setMaxRows to report that the database is not working much?

I think that in the worst case, I cannot rely on this, working to hope. I'm most interested in Postgres 9.1 and Oracle 11.2, so if anyone has experience with these databases, take a step forward.

+6
source share
3 answers

will allow the database to do smart query planning and stop selecting results than necessary.

If you use

PostgreSQL :

 SELECT * FROM tbl ORDER BY col1 LIMIT 10; -- slow without index 

Or:

 SELECT * FROM tbl LIMIT 10; -- fast even without index 

Oracle :

 SELECT * FROM (SELECT * FROM tbl ORDER BY col1 DESC) WHERE ROWNUM < 10; 

.. then only 10 rows will be returned . But if you sort your lines before choosing the top 10, all basically qualifying lines will be read before they can be sorted.

Corresponding indexes can interfere with this overhead!


If you are not sure that JDBC really sends databases to the server, runs a test and has a mechanism for registering the database of received statements. In PostgreSQL, you can install in postgresql.conf :

 log_statement = all 

(and reboot) to register all operators sent to the server. Make sure to reset this parameter after the test or the log files may increase.

+3
source

The thing that can / can kill you with a billion lines is the (very likely) ORDER BY in your query. If this order cannot be established using the index, then., It will break your neck :)

I will not depend on the jdbc driver. As in the previous comment, it is unclear what it actually does (looking at different rdbms).

If you are concerned about the speed of your request, you can use the LIMIT . If you use LIMIT , you can at least make sure that it has passed the database server.

Edit: Sorry, I did not know that Oracle does not support LIMIT .

+1
source

In a direct answer to your question regarding PostgreSQL 9.1: Yes, the JDBC driver will tell the server to stop generating rows outside of what you set.

As others have pointed out, depending on the indices and the chosen plan, the server can scan a very large number of rows to find the five that you want. A proper server configuration may help to accurately model costs in order to prevent this, but if the distribution of values ​​is unusual, you may need to introduce and optimize the barrier (for example, using CTE) to get the scheduler to prepare a good plan.

+1
source

Source: https://habr.com/ru/post/913328/


All Articles