Pandas dataframe with 2 row header and export to csv

I have a data frame

df = pd.DataFrame(columns = ["AA", "BB", "CC"]) df.loc[0]= ["a", "b", "c1"] df.loc[1]= ["a", "b", "c2"] df.loc[2]= ["a", "b", "c3"] 

I need to add a string to the header in the header

 df.columns = pd.MultiIndex.from_tuples(zip(df.columns, ["DD", "EE", "FF"])) 

my df now

  AA BB CC DD EE FF 0 ab c1 1 ab c2 2 ab c3 

but when I write this dataframe to a CSV file

 df.to_csv("test.csv", index = False) 

I got another row than expected

 AA,BB,CC DD,EE,FF ,, a,b,c1 a,b,c2 a,b,c3 
+8
source share
4 answers

This is an ugly hack, but if you need to work something right now (tm), you can write it in two parts:

 >>> pd.DataFrame(df.columns.tolist()).T.to_csv("noblankrows.csv", mode="w", header=False, index=False) >>> df.to_csv("noblankrows.csv", mode="a", header=False, index=False) >>> !cat noblankrows.csv AA,BB,CC DD,EE,FF a,b,c1 a,b,c2 a,b,c3 
+6
source

I think this is a bug in to_csv . If you are looking for workarounds, then here are a couple.

For reading in this csv specify header lines *:

 In [11]: csv = "AA,BB,CC DD,EE,FF ,, a,b,c1 a,b,c2 a,b,c3" In [12]: pd.read_csv(StringIO(csv), header=[0, 1]) Out[12]: AA BB CC DD EE FF 0 ab c1 1 ab c2 2 ab c3 

* strange it seems to ignore empty lines.

To record, you can record the title first and then add:

 with open('test.csv', 'w') as f: f.write('\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n') df.to_csv('test.csv', mode='a', index=False, header=False) 

Note the to_csv part for the MultiIndex column:

 In [21]: '\n'.join([','.join(h) for h in zip(*df.columns)]) + '\n' Out[21]: 'AA,BB,CC\nDD,EE,FF\n' 
+3
source

Use df.to_csv("test.csv", index = False, tupleize_cols=True) to get the resulting CSV:

 "('AA', 'DD')","('BB', 'EE')","('CC', 'FF')" a,b,c1 a,b,c2 a,b,c3 

To read it:

 df2=pd.read_csv("test.csv", tupleize_cols=True) df2.columns=pd.MultiIndex.from_tuples(eval(','.join(df2.columns))) 

To get the exact result you want:

 with open('test.csv', 'a') as f: pd.DataFrame(np.asanyarray(df.columns.tolist())).T.to_csv(f, index = False, header=False) df.to_csv(f, index = False, header=False) 
+2
source

Based on @DSM solution:

if you need (like me) to apply the same hack to export to excel , the main change (besides the expected differences with the to_excel method) is to actually remove the multi-index used for your column labels ...

This is because .to_excel does not support writing out df having multiindex for columns, but not an index (providing index = False for the .to_excel method), on the contrary, toto_csv

In any case, here's how it would look:

 >>> writer = pd.ExcelWriter("noblankrows.xlsx") >>> headers = pd.DataFrame(df.columns.tolist()).T >>> headers.to_excel( writer, header=False, index=False) >>> df.columns = pd.Index(range(len(df.columns))) # that what I was referring to... >>> df.to_excel( writer, header=False, index=False, startrow=len(headers)) >>> writer.save() >>> pd.read_excel("noblankrows.xlsx").to_csv(sys.stdout, index=False) AA,BB,CC DD,EE,FF a,b,c1 a,b,c2 a,b,c3 
+2
source

Source: https://habr.com/ru/post/971253/


All Articles