Sqlalchemy loses connection during request

I am working on a project that brings together several data sources based on registered users. One of the questions in particular gives me a lot of problems:

import numpy as np import pandas as pd from pandas import Series, DataFrame from sqlalchemy import create_engine # of course, the info here is obscured prod_engine = create_engine('mysql+mysqlconnector:// password@host :3306/database',pool_timeout=3600,pool_recycle=3600) query_users = """ SELECT users.id, CASE WHEN ((users.role = '' OR users.role IS NULL) AND users.plan LIKE 'pro%') OR users.role REGEXP '(pro|agent|manager)' THEN 'professional' ELSE 'consumer' END AS 'modified_role', users.created_at, users.logged_in_at AS 'last_login', COUNT(DISTINCT(folders.id)) AS 'folder_count', IF(COUNT(DISTINCT(folders.id)) > 1, '2 or more','0 to 1') AS 'folder_group', MIN(folders.created_at) AS 'first_folder_created', MAX(folders.created_at) AS 'last_folder_created' FROM users LEFT OUTER JOIN folders ON folders.created_by = users.id AND folders.discarded = 0 AND folders.created_at >= '2010-11-30 23:59:59' WHERE users.invalid_email IS NULL GROUP BY 1""" users = pd.read_sql_query(query_users, prod_engine) 

No matter what I tried, I get this error (almost always for three seconds, sometimes instantly).

 InterfaceError: (InterfaceError) 2013: Lost connection to MySQL server during query 

I tried several things, for example adding the options pool_timeout and pool_recycle to the create_engine function for documents here http://docs.sqlalchemy.org/en/latest/core/engines.html

I also tried users = pd.read_sql_query(query_folder_users, prod_engine,chunksize=10000) but got the same error.

Interestingly, this request works fine when I run it in Sequel Pro; it immediately starts returning rows and takes only ~ 10 seconds to complete completely. The output is about 550,000 lines.

There are several other topics / posts that I found, but no one seems to satisfy what I need: https://groups.google.com/forum/#!topic/sqlalchemy/TWL7aWab9ww Disable SQLAlchemy http: // blog .fizyk.net.pl / blog / reminder-set-pool_recycle-for-sqlalchemys-connection-to-mysql.html

Reading the docs here http://dev.mysql.com/doc/refman/5.5/en/error-lost-connection.html , I noticed this line:

Sometimes the "at request time" form occurs when millions of lines are submitted as part of one or more requests. If you know this is happening, you should try to increase the default net_read_timeout value from 30 seconds to 60 seconds or longer, enough for the data transfer to complete.

It looks like I might need to change this setting, but I can't find anything in the SQLAlchemy docs that mention this.

Has anyone encountered this problem before? If so, how did you fix it?

+5
source share
1 answer

Check the MySQL server variable max_allowed_packet_size and increase it. In most cases, MySQL disconnects the connection during the query because the payload is too large.

0
source

Source: https://habr.com/ru/post/1210754/


All Articles