Convert psycopg2 DictRow request to Pandas dataframe

I would like to convert the psycopg2 DictRow request to a pandas dataframe, but pandas keeps complaining:

 curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) curs.execute("SELECT * FROM mytable") data = curs.fetchall() print type(data) print pd.DataFrame(list(data)) 

However, I always get an error, although I specifically passed list

 <type 'list'> TypeError: Expected list, got DictRow 

The result will be the same if I do pd.DataFrame(data) Can someone help me do this work?

It would be nice if the file system column names worked (i.e. DictRow extracts and passed them to the dataframe).

Update:
Since I need to process the data, I would like to use the data from the psycopg2 request as is, and not under pandas , for example. read_sql_query .

+5
source share
2 answers

Hmm, I finally found this hacker solution:

 print pd.DataFrame([i.copy() for i in data]) 

The copy() function of the DictRow class will return the actual dictionary. With the list, I create a list of (identical) dictionaries that Pandas will gladly accept.

I am still puzzled by why list(data) TypeError a TypeError . Maybe someone else can enlighten me.

+5
source

UPDATE: pandas.read_sql_query() is a more elegant way to read an SQL query in a dataframe without the need for psycopg2 . See pandas docs .

I had the same problem. The easiest way is to convert the DictRow to a numpy array.

 import numpy as np curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) curs.execute("SELECT * FROM mytable") data = curs.fetchall() print type(data) print pd.DataFrame(np.array(data)) 

If you want to get the column names, you can access them as keys for each row of the DictRow . However, converting to a numpy array does not preserve order. Thus, one (inelegant) way is as follows:

 curs = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) curs.execute("SELECT * FROM mytable") data = curs.fetchall() print type(data) colNames = data[0].keys() print pd.DataFrame([[row[col] for col in colNames] for row in data], columns=colNames) 
+3
source

Source: https://habr.com/ru/post/1243756/


All Articles