How to get row by row MySQL ResultSet in python

MySQL ResultSets are, by default, fully retrieved from the server before any work can be done. In the case of huge result sets, this becomes unusable. I would instead instead actually retrieve the rows one by one from the server.

In Java, following the instructions here (in the "ResultSet" section), I create the statement as follows:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY); stmt.setFetchSize(Integer.MIN_VALUE); 

This works well in Java. My question is: is there a way to do the same in python?

One thing I tried was to limit the query to 1000 rows at a time, for example:

 start_row = 0 while True: cursor = conn.cursor() cursor.execute("SELECT item FROM items LIMIT %d,1000" % start_row) rows = cursor.fetchall() if not rows: break start_row += 1000 # Do something with rows... 

However, it seems to get slower the higher start_row.

And no, using fetchone() instead of fetchall() does not change anything.

Clarification:

The naive code that I use to reproduce this problem is as follows:

 import MySQLdb conn = MySQLdb.connect(user="user", passwd="password", db="mydb") cur = conn.cursor() print "Executing query" cur.execute("SELECT * FROM bigtable"); print "Starting loop" row = cur.fetchone() while row is not None: print ", ".join([str(c) for c in row]) row = cur.fetchone() cur.close() conn.close() 

In a table of 700 thousand rows, this code runs quickly. But on a table of 9,000,000 lines, it prints β€œQuery Execution,” and then freezes for a long time. That's why it doesn't matter if I use fetchone() or fetchall() .

+45
python mysql
Dec 03 '08 at 15:23
source share
5 answers

I think you need to connect to passing cursorclass = MySQLdb.cursors.SSCursor :

  MySQLdb.connect(user="user", passwd="password", db="mydb", cursorclass = MySQLdb.cursors.SSCursor ) 

By default, the cursor retrieves all data, even if you are not using fetchall .

Edit: SSCursor or any other cursor class that supports server-side result sets - check the module documents for MySQLdb.cursors .

+49
Dec 03 '08 at 16:17
source share

The limit / offset solution is executed in quadratic time because mysql has to re-check the rows to find the offset. As you suspected, the default cursor saves the entire result set on the client, which can consume a lot of memory.

Instead, you can use a server-side cursor that supports query execution and retrieves results if necessary. The cursor class can be customized by providing a connection to the call itself by default or by providing the class to the cursor method each time.

 from MySQLdb import cursors cursor = conn.cursor(cursors.SSCursor) 

But that is not the whole story. In addition to saving the mysql result, by default the client cursor actually selects each row independently. This behavior is undocumented and very unsuccessful. This means that all python objects are created for all strings that consume much more memory than the original mysql result.

In most cases, the result stored on the client, wrapped as an iterator, gives maximum speed with reasonable use of memory. But you have to roll yourself if you want to.

+17
Dec 03 '08 at 17:09
source share

Have you tried this version of fetchone? Or something different?

 row = cursor.fetchone() while row is not None: # process row = cursor.fetchone() 

Also have you tried this?

  row = cursor.fetchmany(size=1) while row is not None: # process row = cursor.fetchmany( size=1 ) 

Not all drivers support them, so you may get errors or find them too slowly.




Change

When it freezes at runtime, you are expecting a database. This is not a Python string-string; what is the thing mysql.

MySQL prefers to extract all rows as part of its own cache management. This is disabled by providing a fetch_size value for Integer.MIN_VALUE (-2147483648L).

The question is, what part of the PAPID DBAPI will become the JDBC equivalent of fetch_size?

I think it could be the arraysize attribute of the cursor. Try

 cursor.arraysize=-2**31 

And see if MySQL forces the stream of the result set instead of caching it.

+7
Dec 03 '08 at 15:40
source share

Try using MySQLdb.cursors.SSDictCursor

 con = MySQLdb.connect(host=host, user=user, passwd=pwd, charset=charset, port=port, cursorclass=MySQLdb.cursors.SSDictCursor); cur = con.cursor() cur.execute("select f1, f2 from table") for row in cur: print row['f1'], row['f2'] 
+1
Sep 10 '13 at 6:49
source share

I found better results by mixing some of the other answers a bit.

This included setting cursorclass=MySQLdb.cursors.SSDictCursor (for MySQLdb) or pymysql.cursors.SSDictCursor (for PyMySQL) as part of the connection settings. This will allow the server to execute the query / results ("SS" means the server side, unlike the default cursor, which brings the result on the client side) and build a dictionary from each line (for example, {'id': 1, name ':' Cookie Monster '}).

Then, to scroll through the lines, an endless loop was called in both Python 2.7 and 3.4, called while rows is not None , because even when cur.fetchmany(size=10000) called and there were no results, the method returned an empty list ( [] ) instead of none.

Actual example:

 query = """SELECT * FROM my_table""" conn = pymysql.connect(host=MYSQL_CREDENTIALS['host'], user=MYSQL_CREDENTIALS['user'], passwd=MYSQL_CREDENTIALS['passwd'], charset='utf8', cursorclass = pymysql.cursors.SSDictCursor) cur = conn.cursor() results = cur.execute(query) rows = cur.fetchmany(size=100) while rows: for row in rows: process(row) rows = cur.fetchmany(size=100) cur.close() conn.close() 
+1
Oct 25 '14 at 3:16
source share



All Articles