MySQL ResultSets are, by default, fully retrieved from the server before any work can be done. In the case of huge result sets, this becomes unusable. I would instead instead actually retrieve the rows one by one from the server.
In Java, following the instructions here (in the "ResultSet" section), I create the statement as follows:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE);
This works well in Java. My question is: is there a way to do the same in python?
One thing I tried was to limit the query to 1000 rows at a time, for example:
start_row = 0 while True: cursor = conn.cursor() cursor.execute("SELECT item FROM items LIMIT %d,1000" % start_row) rows = cursor.fetchall() if not rows: break start_row += 1000
However, it seems to get slower the higher start_row.
And no, using fetchone() instead of fetchall() does not change anything.
Clarification:
The naive code that I use to reproduce this problem is as follows:
import MySQLdb conn = MySQLdb.connect(user="user", passwd="password", db="mydb") cur = conn.cursor() print "Executing query" cur.execute("SELECT * FROM bigtable"); print "Starting loop" row = cur.fetchone() while row is not None: print ", ".join([str(c) for c in row]) row = cur.fetchone() cur.close() conn.close()
In a table of 700 thousand rows, this code runs quickly. But on a table of 9,000,000 lines, it prints βQuery Execution,β and then freezes for a long time. That's why it doesn't matter if I use fetchone() or fetchall() .