I have some old tables that I need to compare with new tables, and I want to use pandas to compare them.
The data looks something like this:
data_old = [
["John", 1,2,3,None],
["Mary", 1,4,None,"Apples"],
["Jean", None,4,-4,"Peaches"],
]
columns = ['name', "A", "B", "C", "D"]
data_new = [
["Jean", 5,4,-4,"Peaches"],
["John", 1,-2,3,None],
["Mary", 1,4,None,"Apples"],
]
Here we have two relatively small data sets, and the values can be text, numeric, or NULL. I wanted to create a subset of the dataframe with the changes and export the DataFrames data to csv.
My problem is that when I sort both data files by name, I get the following error:
ValueError: Can only compare identically-labeled DataFrame objects
That's what I'm doing:
df_old = pd.DataFrame(data=data_old, columns=columns)
df_old.sort(columns='name', inplace=True)
df_new = pd.DataFrame(data=data_new, columns=columns)
df_new.sort(columns='name', inplace=True)
ne = (df_old != df_new).any(1)