I use PostgreSQL and SQLAlchemy in a project that consists of the main process that starts the child processes. All of these processes access the database through SQLAlchemy.
I experience repeated connection failures: the first few child processes are working correctly, but after a while a connection error occurs. Here is the MWCE:
from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, create_engine from sqlalchemy.orm import sessionmaker DB_URL = 'postgresql://user: password@localhost /database' Base = declarative_base() class Dummy(Base): __tablename__ = 'dummies' id = Column(Integer, primary_key=True) value = Column(Integer) engine = None Session = None session = None def init(): global engine, Session, session engine = create_engine(DB_URL) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() def cleanup(): session.close() engine.dispose() def target(id): init() try: dummy = session.query(Dummy).get(id) dummy.value += 1 session.add(dummy) session.commit() finally: cleanup() def main(): init() try: dummy = Dummy(value=1) session.add(dummy) session.commit() p = multiprocessing.Process(target=target, args=(dummy.id,)) p.start() p.join() session.refresh(dummy) assert dummy.value == 2 finally: cleanup() if __name__ == '__main__': i = 1 while True: print(i) main() i += 1
On my system (PostgreSQL 9.6, SQLAlchemy 1.1.4, psycopg2 2.6.2, Python 2.7, Ubuntu 14.04) this gives
1 2 3 4 5 6 7 8 9 10 11 Traceback (most recent call last): File "./fork_test.py", line 64, in <module> main() File "./fork_test.py", line 55, in main session.refresh(dummy) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1422, in refresh only_load_props=attribute_names) is None: File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 223, in load_on_ident return q.one() File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2756, in one ret = self.one_or_none() File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2726, in one_or_none ret = list(self) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2797, in __iter__ return self._execute_and_instances(context) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2820, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 945, in execute return meth(self, multiparams, params) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement compiled_sql, distilled_params File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context context) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception exc_info File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/home/vagrant/latest-sqlalchemy/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 469, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [SQL: 'SELECT dummies.id AS dummies_id, dummies.value AS dummies_value \nFROM dummies \nWHERE dummies.id = %(param_1)s'] [parameters: {'param_1': 11074}]
This is repeatable and always crashing at the same iteration.
I am creating a new engine and session after the fork, as recommended by SQLAlchemy Documentation and elsewhere. Interestingly, the following slightly different approach does not crash:
import contextlib import multiprocessing import sqlalchemy from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, create_engine from sqlalchemy.orm import sessionmaker DB_URL = 'postgresql://user: password@localhost /database' Base = declarative_base() class Dummy(Base): __tablename__ = 'dummies' id = Column(Integer, primary_key=True) value = Column(Integer) @contextlib.contextmanager def get_session(): engine = sqlalchemy.create_engine(DB_URL) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) session = Session() try: yield session finally: session.close() engine.dispose() def target(id): with get_session() as session: dummy = session.query(Dummy).get(id) dummy.value += 1 session.add(dummy) session.commit() def main(): with get_session() as session: dummy = Dummy(value=1) session.add(dummy) session.commit() p = multiprocessing.Process(target=target, args=(dummy.id,)) p.start() p.join() session.refresh(dummy) assert dummy.value == 2 if __name__ == '__main__': i = 1 while True: print(i) main() i += 1
Since the source code is more complex and cannot simply be switched to the latest version, I would like to understand why one of these works and the other does not.
The only obvious difference is that the failure code uses global variables for the engine and the session — they are shared with write-to-write with child processes. However, since I reset them directly after the plug, I do not understand how this can be a problem.
Update
I re-run two pieces of code with the latest version of SQLAlchemy (1.1.5), using both Python 2.7 and Python 3.4. Both results are mainly described above. However, in Python 2.7, the failure of the first part of the code now occurs at the 13th iteration (reproducible), and at 3.4 it already occurs at the third iteration (also reproducible). The second part of the code works without problems in both versions. Here's the trace from 3.4:
1 2 3 Traceback (most recent call last): File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "fork_test.py", line 64, in <module> main() File "fork_test.py", line 55, in main session.refresh(dummy) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 1424, in refresh only_load_props=attribute_names) is None: File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/loading.py", line 223, in load_on_ident return q.one() File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2749, in one ret = self.one_or_none() File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2719, in one_or_none ret = list(self) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2790, in __iter__ return self._execute_and_instances(context) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/orm/query.py", line 2813, in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 945, in execute return meth(self, multiparams, params) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement compiled_sql, distilled_params File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context context) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1393, in _handle_dbapi_exception exc_info File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 186, in reraise raise value.with_traceback(tb) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context context) File "/home/vagrant/latest-sqlalchemy-3.4/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 470, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. [SQL: 'SELECT dummies.id AS dummies_id, dummies.value AS dummies_value \nFROM dummies \nWHERE dummies.id = %(param_1)s'] [parameters: {'param_1': 3397}]
Here's the PostgreSQL log (it's the same for 2.7 and 3.4):
2017-01-18 10:59:36 UTC [22429-1] LOG: database system was shut down at 2017-01-18 10:59:35 UTC 2017-01-18 10:59:36 UTC [22429-2] LOG: MultiXact member wraparound protections are now enabled 2017-01-18 10:59:36 UTC [22428-1] LOG: database system is ready to accept connections 2017-01-18 10:59:36 UTC [22433-1] LOG: autovacuum launcher started 2017-01-18 10:59:36 UTC [22435-1] [unknown]@[unknown] LOG: incomplete startup packet 2017-01-18 11:00:10 UTC [22466-1] user@db LOG: SSL error: decryption failed or bad record mac 2017-01-18 11:00:10 UTC [22466-2] user@db LOG: could not receive data from client: Connection reset by peer
(Note that a message about an incomplete starter pack is harmless )