In the process of converting Python code, I switched to a new version of Pandas 0.14.1 based on SQLAlchemy.
The general model we use (in the general case):
connection = db.connect() # open connection/session sql = 'CREATE TEMP TABLE table1 AS SELECT ...' connection.execute(sql) ... other sql that creates TEMP tables from various joins of previous TEMP tables ... sql = 'CREATE TEMP TABLE tableN AS SELECT ...' connection.execute(sql) result = connection.query('SELECT * FROM tableN WHERE ...') connection.close()
Now that the connection is closed, the TEMP tables are cleared by the database server. However, since the final selection request uses the same connection / session, it can access tables.
How can I achieve similar usage with SQLAlchemy and pd.read_sql_query ()?
For instance:
engine = sqlalchemy.create_engine('netezza://@mydsn') connection = engine.connect() sql = 'CREATE TEMP TABLE tmptable AS SELECT ...' connection.execute(sql) result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine)
gives a DB error that the TEMP tmptable table does not exist. Presumably, this is due to the fact that passing the engine to read_sql_query () requires it to open a new connection, which has an independent session area and, therefore, cannot see the TEMP table. Is that a reasonable guess?
Is there any way around this? (passing a connection to read_sql_query () is not supported)
(I know that I can combine SQL in one line with the separating operators, but this is a simplification of the real situation when TEMP tables are created by a variety of functions that are called by others nested in 3-4 depths. To achieve this, you will need to implement a layer that can combine SQL through a few calls before its release, which I would prefer to avoid implementation if there is a better way).
Use -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kun Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy