How to get Python pandas DataFrame from a string written by print ()?

Question

How to get Python pandas DataFrame from a string written by print ()?

THIS is an updated version of the question providing a convenient feature

pd_read_printed(str_printed_df)

designed to create a pandas DataFrame from a string previously written using print (some_pandas_DataFrame):

 def pd_read_printed(str_printed_df): global pd, StringIO try: x = pd except: import pandas as pd try: x = StringIO except: from pandas.compat import StringIO return pd.read_csv(StringIO(str_printed_df), delim_whitespace=True)

I compiled it for my own use after I have the answers to the following question:

I often see the contents of the pandas DataFrame in its print version on the Internet, for example, for example:

 df1_as_string = """ Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 """

The question arises: how to get a variable containing a DataFrame from a string variable in the style, for example:

 df1 = pandas.someToMeUnknownPandasFunction(df1_as_string)

?

NOW you can use the supplied function to create DataFrame of df1_as_string :

 df1 = pd_read_printed(df1_as_string)

and check if it works as expected:

 print(df1)

gives:

  Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8

+5

python string python-3.x pandas csv

Claudio Apr 23 '17 at 13:02

source share

2 answers

Two methods

option 1
pd.read_clipboard

This is my goto method for just formatted frames. I copy the text of the data block and track it with df = pd.read_clipboard()

option 2
StringIO + pd.read_csv

For frames with a more complex structure, I may need some options in read_csv , so I can configure it this way. Keep in mind that for the data you provided, I almost never did it this way, because it was slower to get to the data frame.

 from io import StringIO import pandas as pd df1_as_string = """ Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 """ df = pd.read_csv(StringIO(df1_as_string), delim_whitespace=True)

Anyway, I get:

 print(df) Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8

+4

piRSquared Apr 23 '17 at 13:04

source share

jezrael · Accepted Answer · 2017-04-23T13:04:27+0000

Use read_clipboard .

 df = pd.read_clipboard()

Or read_csv with a delimiter one or more whitespace - sep='\s+' or delim_whitespace=True :

 from pandas.compat import StringIO df = pd.read_csv(StringIO(df1_as_string), sep="\s+")

 df = pd.read_csv(StringIO(df1_as_string), delim_whitespace=True)

 print (df) Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8

How to get Python pandas DataFrame from a string written by print ()?

More articles: