Using python MySQLDB SScursor with nested queries

A typical MySQLdb library query can use a lot of memory and work poorly in Python when a large result set is generated. For instance:

cursor.execute("SELECT id, name FROM `table`") for i in xrange(cursor.rowcount): id, name = cursor.fetchone() print id, name 

There is an extra pointer that will only retrieve one line at a time, really speeding up the script and significantly reducing the amount of script memory.

 import MySQLdb import MySQLdb.cursors conn = MySQLdb.connect(user="user", passwd="password", db="dbname", cursorclass = MySQLdb.cursors.SSCursor) cur = conn.cursor() cur.execute("SELECT id, name FROM users") row = cur.fetchone() while row is not None: doSomething() row = cur.fetchone() cur.close() conn.close() 

But I can not find anything about using SSCursor with nested queries. If this is the definition of doSomething() :

 def doSomething() cur2 = conn.cursor() cur2.execute('select id,x,y from table2') rows = cur2.fetchall() for row in rows: doSomethingElse(row) cur2.close() 

then the script produces the following error:

 _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") 

It sounds like SSCursor incompatible with SSCursor . It's true? If this is so bad, because the main loop is too slow with the standard cursor.

+5
source share
1 answer

This issue is discussed in the MySQLdb user guide under the threadsafety attribute threadsafety (my selection):

The MySQL protocol cannot handle multiple threads using the same connection at once. Some earlier versions of MySQLdb used locking to provide security for ceiling 2. Although this is not very difficult to do using the standard Cursor class (which uses mysql_store_result() ), it is complicated by SSCursor (which uses mysql_use_result() ; with the latter, you need to make sure that everything The lines have been read before another request can be executed.

The MySLQ C API function documentation C mysql_use_result() contains additional information about your error message:

When using mysql_use_result() you must execute mysql_fetch_row() until a NULL value is obtained, otherwise the selected rows will be returned as part of the result set for your next query. C API gives error Commands out of sync; you can't run this command now Commands out of sync; you can't run this command now if you forget to do it!

In other words, you should completely get the result set from any unbuffered cursor (i.e. the one that uses mysql_use_result() instead of mysql_store_result() - with MySQLdb, i.e. SSCursor and SSDictCursor ) before you can make another statement on the same connection.

In your situation, the most direct solution would be to open a second connection for use when re-executing the result set of an unbuffered query. (This would not help just get the buffered cursor from the same connection, you still have to move past the unbuffered result set before using the buffered cursor.)

If your workflow is like “looping through a large result set that executes N small queries for each row”, consider looking for MySQL stored procedures as an alternative to nested cursors from different connections. You can still use MySQLdb to call the procedure and get the results, although you will definitely want to read the documentation of the MySQLdb callproc() method since when you extract the results of the procedure, it does not meet the Python API database specification .


The second alternative is to stick to buffered cursors, but split your request into batches. This is what I finished for the project last year, when I needed to go through many millions of rows, parse some data using my own module and execute several INSERT and UPDATE queries after processing each row. The general idea is something like this:

 QUERY = r"SELECT id, name FROM `table` WHERE id BETWEEN %s and %s;" BATCH_SIZE = 5000 i = 0 while True: cursor.execute(QUERY, (i + 1, i + BATCH_SIZE)) result = cursor.fetchall() # If there no possibility of a gap as large as BATCH_SIZE in your table ids, # you can test to break out of the loop like this (otherwise, adjust accordingly): if not result: break for row in result: doSomething() i += BATCH_SIZE 

Another thing I would like to point out about your code example is that you can iterate directly over the cursor in MySQLdb instead of directly calling fetchone() on top of xrange(cursor.rowcount) . This is especially important when using an unbuffered cursor, because the rowcount attribute is undefined and will give very unexpected results (see Python MysqlDB using cursor.rowcount with SSDictCursor failing to calculate ).

+6
source

Source: https://habr.com/ru/post/1202266/


All Articles