Kind of messy but single-line:
(df.join(pd.concat({'sum': df.groupby(level=1, axis=1).sum()}, axis=1)) .sortlevel(level=1, axis=1))
Produces this for me:
ABC sum ABC \ geoffrey geoffrey geoffrey geoffrey julian julian julian 0 0.073676 0.279702 0.258112 0.611490 0.204082 0.795725 0.258185 1 0.096733 0.541198 0.757652 1.395583 0.263235 0.507324 0.525919 2 0.028613 0.364365 0.924733 1.317711 0.196243 0.653408 0.174911 3 0.831569 0.738029 0.294473 1.864071 0.528785 0.654160 0.940831 4 0.263250 0.855270 0.264698 1.383218 0.853517 0.803087 0.701937 5 0.069519 0.374411 0.846647 1.290577 0.239797 0.943544 0.189361 6 0.290414 0.873785 0.118713 1.282912 0.980734 0.850097 0.903645 7 0.088387 0.062140 0.872064 1.022591 0.591942 0.566298 0.568482 8 0.061483 0.008404 0.658370 0.728257 0.818167 0.282050 0.449198 9 0.427602 0.171458 0.234426 0.833486 0.217424 0.471933 0.390549 sum julian 0 1.257992 1 1.296478 2 1.024562 3 2.123776 4 2.358541 5 1.372702 6 2.734476 7 1.726722 8 1.549415 9 1.079906
I just said: "Here is my df
, first groups the names of people and sums them up, then attach these two summarized columns to the original df
, then use sortlevel
to sort by level=1
> and axis=1
"
So, the only reason “sum” appears after column C
only because the letter s
appears after C
If you have a column named x
, this will not work. Not sure if this is important.
Here df
I used for leisure purposes:
df = pd.DataFrame({ ('C', 'julian'): [0.258185, 0.52591899999999991, 0.17491099999999998, 0.94083099999999997, 0.70193700000000003, 0.189361, 0.90364500000000003, 0.56848199999999993, 0.44919799999999993, 0.39054899999999998], ('B', 'geoffrey'): [0.27970200000000001, 0.54119799999999996, 0.36436499999999999, 0.73802900000000005, 0.85527000000000009, 0.37441099999999999, 0.87378500000000003, 0.062140000000000001, 0.008404, 0.171458], ('A', 'julian'): [0.20408199999999999, 0.263235, 0.196243, 0.52878500000000006, 0.85351699999999997, 0.23979699999999998, 0.98073399999999999, 0.59194199999999997, 0.81816699999999998, 0.21742399999999998], ('B', 'julian'): [0.79572500000000002, 0.507324, 0.65340799999999999, 0.65416000000000007, 0.803087, 0.94354400000000005, 0.85009699999999988, 0.56629799999999997, 0.28205000000000002, 0.47193299999999999], ('A', 'geoffrey'): [0.073676000000000005, 0.096733, 0.028613, 0.831569, 0.26324999999999998, 0.069519000000000011, 0.29041400000000001, 0.088387000000000007, 0.061483000000000003, 0.42760200000000004], ('C', 'geoffrey'): [0.25811200000000001, 0.75765199999999999, 0.92473300000000003, 0.29447299999999998, 0.26469799999999999, 0.84664699999999993, 0.11871300000000001, 0.87206399999999995, 0.65837000000000001, 0.23442600000000002]}, columns=pd.MultiIndex.from_tuples([('A','julian'),('A','geoffrey'), ('B','julian'),('B','geoffrey'), ('C','julian'),('C','geoffrey')]))
Edited by:
Here's a different approach:
sum_columns = [('sum', name) for name in df.columns.levels[1].tolist()] df[sum_columns] = df.groupby(axis=1, level=1).sum() df = df.sortlevel(level=1, axis=1)
sum_columns
- it looks like this: [('sum', 'geoffrey'), ('sum', 'julian')]
.
df[sum_columns]
creates a new column "sum" for each name at level 1.
If you want to have a sum column next to the name, use sortlevel
.