Pandas error when comparing two data with sorting

I have some old tables that I need to compare with new tables, and I want to use pandas to compare them.

The data looks something like this:

data_old = [
    ["John", 1,2,3,None],
    ["Mary", 1,4,None,"Apples"],
    ["Jean", None,4,-4,"Peaches"],
]
columns = ['name', "A", "B", "C", "D"]
data_new = [
    ["Jean", 5,4,-4,"Peaches"],
    ["John", 1,-2,3,None],
    ["Mary", 1,4,None,"Apples"],
    ]

Here we have two relatively small data sets, and the values ​​can be text, numeric, or NULL. I wanted to create a subset of the dataframe with the changes and export the DataFrames data to csv.

My problem is that when I sort both data files by name, I get the following error:

ValueError: Can only compare identically-labeled DataFrame objects

That's what I'm doing:

df_old = pd.DataFrame(data=data_old, columns=columns)
df_old.sort(columns='name', inplace=True)
df_new = pd.DataFrame(data=data_new, columns=columns)
df_new.sort(columns='name', inplace=True)
ne = (df_old != df_new).any(1) #ERROR
# to other stuff.....
+4
source share
1 answer

I think you need sort_valuesinstead sort, because:

FutureWarning: sort (columns =....) , sort_values ​​(by =.....)

DataFrames, reset_index drop=True:

df_old = pd.DataFrame(data=data_old, columns=columns)
df_old.sort_values('name', inplace=True)
df_new = pd.DataFrame(data=data_new, columns=columns)
df_new.sort_values('name', inplace=True)
print (df_old)
   name    A  B    C        D
2  Jean  NaN  4 -4.0  Peaches
0  John  1.0  2  3.0     None
1  Mary  1.0  4  NaN   Apples

print (df_new)
   name  A  B    C        D
0  Jean  5  4 -4.0  Peaches
1  John  1 -2  3.0     None
2  Mary  1  4  NaN   Apples

df_old.reset_index(drop=True, inplace=True)
df_new.reset_index(drop=True, inplace=True)

ne = (df_old != df_new).any(1)
print (ne)
0    True
1    True
2    True
dtype: bool
+4

Source: https://habr.com/ru/post/1660682/


All Articles