I am having some strange errors that are apparently caused by the connections used by Sqlalchemy, which I cannot pinpoint. I was hoping someone would tell me what was going on here.
We are working on Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connections. Sometimes we get errors related to a db connection or session, most of the time it was a cursor already closed or This Connection is closed error, but we also get other related exceptions:
(OperationalError) connection pointer is NULL (InterfaceError) cursor already closed Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed A conflicting state is already present in the identity map for key (<class '...'>, (1001L,)) This Connection is closed (original cause: ResourceClosedError: This Connection is closed) (InterfaceError) cursor already closed Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed 'NoneType' object has no attribute 'twophase' (OperationalError) connection pointer is NULL This session is in 'prepared' state; no further
There is no silver bullet to reproduce them, only refreshing many times, they must happen at some point. So I created a script using multi-mechanize to spam different URLs at the same time and see where and when this happens.
The initiated url does not seem to matter much, errors occur when there are concurrent requests that span a longer time (and other requests are served between them). This, apparently, indicates some kind of threading problem; that a session or connection is shared between different threads.
After searching for these issues, I found many topics, most of which say that you need to use sessions with limited access, but we already use them:
db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False)) db_meta = MetaData()
We have a BaseModel for all of our orm objects:
BaseModel = declarative_base (cls = BaseModelObj, metaclass = BaseMeta, metadata = db_meta)
We use tween pyramid_tm to process transactions during the request
We attach db_session.remove () to the NewResponse event of the pyramid (which fires after everything is done). I also tried putting it in a separate animation running pyramid_tm or not even doing it at all, none of them seem to have an effect, so the response event seemed to be the cleanest place to put it.
We create the engine at our main entry point to the project of our pyramid and use NullPool and leave the connection pool for pgbouncer. We also configure session and binding for our BaseModel here:
engine = engine_from_config (config.registry.settings, 'sqlalchemy.', poolclass = NullPool) db_session.configure (bind = engine, query_cls = FilterQuery) BaseModel.metadata.bind = engine config.add_subscriber (cleanup_db_sessions reset.makeshake ()
In our application, we access the entire db operation using:
from project.db import db_session ... db_session.query (MyModel) .filter (...) db_session.execute (...)
We use psycopg2 == 2.5.2 to handle the connection with postgres using pgbouncer between
I made sure db_session links or connections are not saved anywhere (which may lead to reuse of other threads)
I also tried a spam test using various web servers using waitress and cogen. I got errors very easily using wsgiref, it is not surprising that we have no errors (which is single-threaded). Using uwsgi and gunicorn (4 workers, gevent), I did not get any errors.
Given the differences in the web server used, I thought that this is due to some web servers that process requests in threads, and some to the use of new processes (maybe a problem with forking)? To complicate the situation even more, when time went on, and I did some new tests, the problem went to the waitress, but now it happened with the gun (when using gevent)! I don't know how to debug this ...
Finally, in order to check what happens to the connection, I bound the attribute to the connection at the beginning of the cursor execution and tried to read the attribute at the end of the execution:
@event.listens_for(Engine, "before_cursor_execute") def _before_cursor_execute(conn, cursor, stmt, params, context, execmany): conn.pdtb_start_timer = time.time() @event.listens_for(Engine, "after_cursor_execute") def _after_cursor_execute(conn, cursor, stmt, params, context, execmany): print conn.pdtb_start_timer
Surprisingly, this sometimes caused an exception: the Connection object does not have the pdtb_start_timer attribute
Which seemed to me very strange. I found one discussion of something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ And tried to add strategy = 'threadlocal' to the engine, which from what I understand should force 1 connection for the tread. But this did not have any effect on the errors that I see ... (besides some unittests, since I do not need two different sessions / connections for some tests, and this forces 1 connection to be connected)
Does anyone know what could be here, or are there some more pointers on how to attack this problem?
Thanks in advance!
Mattis Blaas