How to read all rows from a huge table?

I have a problem processing all rows from a database (PostgreSQL). I get an error: org.postgresql.util.PSQLException: Ran out of memory retrieving query results. It seems to me that I need to read all the lines in small parts, but this will not work - it reads only 100 lines (the code below). How to do it?

  int i = 0; Statement s = connection.createStatement(); s.setMaxRows(100); // bacause of: org.postgresql.util.PSQLException: Ran out of memory retrieving query results. ResultSet rs = s.executeQuery("select * from " + tabName); for (;;) { while (rs.next()) { i++; // do something... } if ((s.getMoreResults() == false) && (s.getUpdateCount() == -1)) { break; } } 
+48
java postgresql jdbc
Sep 10 '10 at 6:23
source share
6 answers

Use CURSOR in PostgreSQL or let the JDBC driver handle this for you .

LIMIT and OFFSET will work slowly when processing large data sets.

+38
Sep 10 '10 at 6:35
source share
โ€” -

The short version is a call to stmt.setFetchSize(50); and conn.setAutoCommit(false); so as not to read the entire ResultSet into memory.

Here's what the doctors say:

Retrieving cursor-based results

By default, the driver collects all query results at once. This can be inconvenient for large datasets, so the JDBC driver provides a means to base the ResultSet on the database cursor and select only a small number of rows.

A small number of lines are cached on the client side of the connection, and when exhausted, the next block of lines is retrieved by changing the cursor position.

Remarks:

  • Cursor-based ResultSets cannot be used in all situations. There are a number of restrictions due to which the driver silently returns to loading the entire ResultSet at the same time.

  • Connection to the server must be carried out using the V3 protocol. This is the default value for (and only supported) server versions 7.4 and later.

  • The connection should not be in auto-lock mode. The backend closes the cursors at the end of the transaction, therefore, in the automatic commit mode, the backend will close the cursor before anything can be extracted from it.

  • The statement must be created with type ResultSet ResultSet.TYPE_FORWARD_ONLY. This is the default value, so you do not need to rewrite the code to take advantage of this, but it also means that you cannot scroll backward or otherwise navigate to the ResultSet.-

  • This query should be a single statement, not multiple statements connected by a semicolon.

Example 5.2. Set the sample size for turning cursors on and off.

Switching the code to cursor mode is as easy as setting the sample size in a statement of the appropriate size. Setting the sample size back to 0 will cache all the rows (default behavior).

 // make sure autocommit is off conn.setAutoCommit(false); Statement st = conn.createStatement(); // Turn use of the cursor on. st.setFetchSize(50); ResultSet rs = st.executeQuery("SELECT * FROM mytable"); while (rs.next()) { System.out.print("a row was returned."); } rs.close(); // Turn the cursor off. st.setFetchSize(0); rs = st.executeQuery("SELECT * FROM mytable"); while (rs.next()) { System.out.print("many rows were returned."); } rs.close(); // Close the statement. st.close(); 



+64
Sep 12 '10 at 10:35 on
source share

Thus, it turns out that the essence of the problem lies in the fact that by default Postgres starts in "autoCommit" mode, and it also needs / uses cursors to be able to scroll through the data (for example: read the first 10K of results, then next, then next), however, cursors can only exist inside a transaction. Thus, by default, all lines in RAM are always read, and then your program is allowed to start processing the "first line of the result, then the second" after all this has been done for two reasons, and not in a transaction (therefore, cursors do not work), and also the sample size has not been set.

Thus, the psql command-line tool achieves a batch response (its FETCH_COUNT parameter) for queries, is to "wrap" your select queries in a short-term transaction (if the transaction is not already open) so that the cursors can work. You can do something similar with JDBC too:

  static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException { boolean originalAutoCommit = conn.getAutoCommit(); if (originalAutoCommit) { conn.setAutoCommit(false); // start temp transaction } try (Statement statement = conn.createStatement()) { statement.setFetchSize(fetchCount); ResultSet rs = statement.executeQuery(originalQuery); while (rs.next()) { consumer.accept(rs); // or just do you work here } } finally { if (originalAutoCommit) { conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction } } } @FunctionalInterface public interface ConsumerWithException<T, E extends Exception> { void accept(T t) throws E; } 

This gives the advantage of less RAM and, according to my results, generally works faster, even if you do not need to save RAM. Weird This also gives the advantage that your processing of the first line โ€œstarts fasterโ€ (since it processes the page at a time).

And here's how to make it a "raw postgres cursor," along with a complete demo code , although in my experiments it seemed that the JDBC above was a little faster for any reason.

Another option would be autoCommit mode everywhere, although you still have to always manually specify fetchSize for each new statement (or you can set the default sample size in the URL string).

+6
Nov 27 '17 at 18:26
source share

I think your question is similar to this thread: JDBC Pagination , which contains solutions for your need.

For PostgreSQL in particular, you can use the LIMIT and OFFSET keywords in your query: http://www.petefreitag.com/item/451.cfm

PS: In Java code, I suggest you use PreparedStatement instead of simple operators: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html

+2
Sep 10 '10 at 6:28
source share

I did this as shown below. Not the best way, I think, but it works :)

  Connection c = DriverManager.getConnection("jdbc:postgresql://...."); PreparedStatement s = c.prepareStatement("select * from " + tabName + " where id > ? order by id"); s.setMaxRows(100); int lastId = 0; for (;;) { s.setInt(1, lastId); ResultSet rs = s.executeQuery(); int lastIdBefore = lastId; while (rs.next()) { lastId = Integer.parseInt(rs.getObject(1).toString()); // ... } if (lastIdBefore == lastId) { break; } } 
0
Sep 12 '10 at 10:28
source share

Probably, in my case, the problem was on the client who is trying to get the results.

It is required to receive .csv with ALL results.

I found a solution using

 psql -U postgres -d dbname -c "COPY (SELECT * FROM T) TO STDOUT WITH DELIMITER ','" 

(where dbname is db name ...) and redirect to file.

0
Feb 14 '13 at 19:27
source share



All Articles