How to get Python pandas DataFrame from a string written by print ()?

THIS is an updated version of the question providing a convenient feature

pd_read_printed(str_printed_df)

designed to create a pandas DataFrame from a string previously written using print (some_pandas_DataFrame):

 def pd_read_printed(str_printed_df): global pd, StringIO try: x = pd except: import pandas as pd try: x = StringIO except: from pandas.compat import StringIO return pd.read_csv(StringIO(str_printed_df), delim_whitespace=True) 

I compiled it for my own use after I have the answers to the following question:

I often see the contents of the pandas DataFrame in its print version on the Internet, for example, for example:

 df1_as_string = """ Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 """ 

The question arises: how to get a variable containing a DataFrame from a string variable in the style, for example:

 df1 = pandas.someToMeUnknownPandasFunction(df1_as_string) 

?

NOW you can use the supplied function to create DataFrame of df1_as_string :

 df1 = pd_read_printed(df1_as_string) 

and check if it works as expected:

 print(df1) 

gives:

  Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 
+5
source share
2 answers

Use read_clipboard .

 df = pd.read_clipboard() 

Or read_csv with a delimiter one or more whitespace - sep='\s+' or delim_whitespace=True :

 from pandas.compat import StringIO df = pd.read_csv(StringIO(df1_as_string), sep="\s+") 
 df = pd.read_csv(StringIO(df1_as_string), delim_whitespace=True) 

 print (df) Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 
+7
source

Two methods

option 1
pd.read_clipboard

This is my goto method for just formatted frames. I copy the text of the data block and track it with df = pd.read_clipboard()

option 2
StringIO + pd.read_csv

For frames with a more complex structure, I may need some options in read_csv , so I can configure it this way. Keep in mind that for the data you provided, I almost never did it this way, because it was slower to get to the data frame.

 from io import StringIO import pandas as pd df1_as_string = """ Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 """ df = pd.read_csv(StringIO(df1_as_string), delim_whitespace=True) 

Anyway, I get:

 print(df) Sp Mt Value count 4 MM2 S4 bg 10 5 MM2 S4 dgd 1 6 MM4 S2 rd 2 7 MM4 S2 cb 8 8 MM4 S2 uyi 8 
+4
source

Source: https://habr.com/ru/post/1267067/


All Articles