How does pandas.read_sql_query () query a TEMP table?

In the process of converting Python code, I switched to a new version of Pandas 0.14.1 based on SQLAlchemy.

The general model we use (in the general case):

connection = db.connect() # open connection/session sql = 'CREATE TEMP TABLE table1 AS SELECT ...' connection.execute(sql) ... other sql that creates TEMP tables from various joins of previous TEMP tables ... sql = 'CREATE TEMP TABLE tableN AS SELECT ...' connection.execute(sql) result = connection.query('SELECT * FROM tableN WHERE ...') connection.close() 

Now that the connection is closed, the TEMP tables are cleared by the database server. However, since the final selection request uses the same connection / session, it can access tables.

How can I achieve similar usage with SQLAlchemy and pd.read_sql_query ()?

For instance:

 engine = sqlalchemy.create_engine('netezza://@mydsn') connection = engine.connect() sql = 'CREATE TEMP TABLE tmptable AS SELECT ...' connection.execute(sql) result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine) 

gives a DB error that the TEMP tmptable table does not exist. Presumably, this is due to the fact that passing the engine to read_sql_query () requires it to open a new connection, which has an independent session area and, therefore, cannot see the TEMP table. Is that a reasonable guess?

Is there any way around this? (passing a connection to read_sql_query () is not supported)

(I know that I can combine SQL in one line with the separating operators, but this is a simplification of the real situation when TEMP tables are created by a variety of functions that are called by others nested in 3-4 depths. To achieve this, you will need to implement a layer that can combine SQL through a few calls before its release, which I would prefer to avoid implementation if there is a better way).

Use -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kun Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy

+8
source share
4 answers

Now you can pass SQLAlchemy plugin to pandas.read_sql . From the docs :

pandas.read_sql (sql, con, index_col = No, coerce_float = True, params = No, parse_dates = No, columns = No, fragment size = No)

...

con: SQLAlchemy pluggable (mechanism / connection) or database row URI

or DBAPI2 connection (standby)

Using SQLAlchemy allows you to use any database supported by this library. If the object is DBAPI2, only sqlite3 is supported.

So this should work:

 engine = sqlalchemy.create_engine('netezza://@mydsn') connection = engine.connect() sql = 'CREATE TEMP TABLE tmptable AS SELECT ...' connection.execute(sql) result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=connection) 
+1
source

You are using Python and Netezza, I have used R and SQL Server, so this could be different. In my script, I ran into a similar problem. sp_execute_external_script in T-SQL, which allows external code to run in the database, only allows select statements. This was cumbersome for me because I wanted to run a stored procedure to create a temporary table for selection. As an alternative, I could use common table expressions, unions, etc. Perhaps it is worth continuing the investigation.

0
source

I understand the problem, but does the creation of regular tables not work? You can come up with a convention such as CREATE TABLE TEMP_t1' etc., and DROP' at the end of the session.

0
source

As @ssharma says, you can now pass in SQLAlchemy, connected to pandas.read_sql. If you are creating a session with the session creator, you need a connection object.

To read uncommitted changes, you should use the same connection as this:

 engine = sqlalchemy.create_engine('netezza://@mydsn') session = sessionmaker(bind=self.engine)() sql = 'CREATE TEMP TABLE tmptable AS SELECT ...' session.execute(sql) result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=session.connection()) 
0
source

Source: https://habr.com/ru/post/976525/


All Articles