Assuming this is your starting point -
df
A_Benz B_Benz A_Audi B_Audi A_Honda B_Honda
1 1 0 1 1 0 0
2 1 0 0 1 0 0
3 1 0 0 1 0 0
4 1 0 1 1 1 1
5 1 0 0 1 0 0
Option 1
This would make a good use case for filter:
i = df.filter(regex='^A_*')
j = df.filter(regex='^B_*')
i.columns = i.columns.str.split('_', 1).str[-1]
j.columns = j.columns.str.split('_', 1).str[-1]
(i - j).add_prefix('diff_')
diff_Benz diff_Audi diff_Honda
1 1 0 0
2 1 -1 0
3 1 -1 0
4 1 0 0
5 1 -1 0
, concat
df = pd.concat([df, (i - j).add_prefix('diff_')], axis=1)
2
diff; :
import re
df = df[sorted(df.columns, key=lambda x: x.split('_', 1)[1])]
df.diff(-1, axis=1).iloc[:, ::2].rename(columns=lambda x: re.sub('A_', 'diff_', x))
diff_Benz diff_Audi diff_Honda
1 1.0 0.0 0.0
2 1.0 -1.0 0.0
3 1.0 -1.0 0.0
4 1.0 0.0 0.0
5 1.0 -1.0 0.0
( @jpp) -
c = sorted(df.columns, key=lambda x: x.split('_', 1)[1])
df = df[c]
pd.DataFrame(
df.iloc[:, ::2].values - df.iloc[:, 1::2].values, columns=c[::2]
)
A_Audi A_Benz A_Honda
0 0 1 0
1 -1 1 0
2 -1 1 0
3 0 1 0
4 -1 1 0