Morning.
I reduced a much larger situation to the following:
I have one file with a data framework with some values ββin it.
df = pd.DataFrame( {'joe': [['dog'], ['cat'], ['fish'], ['rabbit']], 'ben': [['dog'], ['fish'], ['fish'], ['bear']]}) df: ben joe 0 [dog] [dog] 1 [fish] [cat] 2 [fish] [fish] 3 [bear] [rabbit]
The type of data contained in this data frame is as follows:
type(df.iloc[2,1]),df.iloc[2,1] >>> (list, ['fish'])
When I save the dataframe for excel using pd.to_excel() :
writer1 = pd.ExcelWriter('Input Output Test.xlsx') df.to_excel(writer1,'Sheet1') writer1.save()
I immediately read this in the same file as follows:
dfi = pd.read_excel(open('Input Output Test.xlsx'), sheetname='Sheet1')
I check the data type again:
type(dfi.iloc[2,1]),dfi.iloc[2,1] >>> (unicode, u"['fish']")
Data is now in Unicode format. This is problematic because when I compare two data frames as follows, all the results are false due to inappropriate string formats:
np.where(df['joe'] == dfi['joe'],True,False) dfi: ben joe test 0 ['dog'] ['dog'] False 1 ['fish'] ['cat'] False 2 ['fish'] ['fish'] False 3 ['bear'] ['rabbit'] False
What happens during the read and write process causing this change, and how do I change it to save the str post post save?
E: Unfortunately, the nature of my problem dictates the need to save the data frame and manage it in another file.
Edit in response to EdChum's comment: if I instead save these lines as strings and not lists: I still get the same error:
df = pd.DataFrame({'joe': ['dog', 'cat', 'fish', 'rabbit'], 'ben': ['dog', 'fish', 'fish', 'bear']}) ben joe 0 dog dog 1 fish cat 2 fish fish 3 bear rabbit writer1 = pd.ExcelWriter('Input Output Test Joe.xlsx') df.to_excel(writer1,'Sheet1') writer1.save() dfi = pd.read_excel(open('Input Output Test Joe.xlsx','rb'), sheetname='Sheet1') type(dfi.iloc[2, 1]), dfi.iloc[2, 1] (unicode, u'fish')
Again, the comparison fails.
Edit: Unicode evaluation for a regular string can also be achieved with ast.literal_eval() , as described here: Converting a string representation of a list to a list in Python or as an EdChum clause.
Note. If you use to_csv() and read_csv() , this problem is missing.
But why does to_excel() / re_excel() change the source code?