Reading tuples from a csv file using pandas

Using pandas, I exported to a csv file a data framework whose cells contain tuples of strings. The resulting file has the following structure:

index,colA
1,"('a','b')"
2,"('c','d')"

Now I want to read it using read_csv. However, no matter what I try, pandas interprets the values ​​as strings, not tuples. For instance:

In []: import pandas as pd
       df = pd.read_csv('test',index_col='index',dtype={'colA':tuple})
       df.loc[1,'colA']
Out[]: "('a','b')"

Is there a way to tell pandas to do the right thing? Preferably without heavy post-processing of the data frame: the actual table has 5,000 rows and 2,500 columns.

+2
source share
1 answer

Storing tuples in a column is usually not a good idea; many benefits of using Series and DataFrames are lost. However, you can use the convertersstring for subsequent processing:

>>> df = pd.read_csv("sillytup.csv", converters={"colA": ast.literal_eval})
>>> df
   index    colA
0      1  (a, b)
1      2  (c, d)

[2 rows x 2 columns]
>>> df.colA.iloc[0]
('a', 'b')
>>> type(df.colA.iloc[0])
<type 'tuple'>

, , , .

+6

Source: https://habr.com/ru/post/1687576/


All Articles