You can groupby the column index and take mean :
In [11]: df.groupby(level=0, axis=1).mean() Out[11]: bar foo hello 0 1 0.5 5 1 1 1.5 5 2 1 2.5 5
A slightly more complex example: if there is no numeric column:
In [21]: df Out[21]: foo bar foo hello 0 0 1 1 a 1 1 1 2 a 2 2 1 3 a
The above will raise: DataError: No numeric types to aggregate . Definitely not going to win any performance prizes, but here is a general way to do it in this case:
In [22]: dupes = df.columns.get_duplicates() In [23]: dupes Out[23]: ['foo'] In [24]: pd.DataFrame({d: df[d] for d in df.columns if d not in dupes}) Out[24]: bar hello 0 1 a 1 1 a 2 1 a In [25]: pd.concat(df.xs(d, axis=1) for d in dupes).groupby(level=0, axis=1).mean() Out[25]: foo 0 0.5 1 1.5 2 2.5 In [26]: pd.concat([Out[24], Out[25]], axis=1) Out[26]: foo bar hello 0 0.5 1 a 1 1.5 1 a 2 2.5 1 a
I think the thing that needs to be removed is to avoid duplicate columns ... or maybe I don't know what I'm doing.
source share