Thus, it turns out that the essence of the problem lies in the fact that by default Postgres starts in "autoCommit" mode, and it also needs / uses cursors to be able to scroll through the data (for example: read the first 10K of results, then next, then next), however, cursors can only exist inside a transaction. Thus, by default, all lines in RAM are always read, and then your program is allowed to start processing the "first line of the result, then the second" after all this has been done for two reasons, and not in a transaction (therefore, cursors do not work), and also the sample size has not been set.
Thus, the psql command-line tool achieves a batch response (its FETCH_COUNT parameter) for queries, is to "wrap" your select queries in a short-term transaction (if the transaction is not already open) so that the cursors can work. You can do something similar with JDBC too:
static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException { boolean originalAutoCommit = conn.getAutoCommit(); if (originalAutoCommit) { conn.setAutoCommit(false); // start temp transaction } try (Statement statement = conn.createStatement()) { statement.setFetchSize(fetchCount); ResultSet rs = statement.executeQuery(originalQuery); while (rs.next()) { consumer.accept(rs); // or just do you work here } } finally { if (originalAutoCommit) { conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction } } } @FunctionalInterface public interface ConsumerWithException<T, E extends Exception> { void accept(T t) throws E; }
This gives the advantage of less RAM and, according to my results, generally works faster, even if you do not need to save RAM. Weird This also gives the advantage that your processing of the first line โstarts fasterโ (since it processes the page at a time).
And here's how to make it a "raw postgres cursor," along with a complete demo code , although in my experiments it seemed that the JDBC above was a little faster for any reason.
Another option would be autoCommit mode everywhere, although you still have to always manually specify fetchSize for each new statement (or you can set the default sample size in the URL string).
rogerdpack Nov 27 '17 at 18:26 2017-11-27 18:26
source share