I came across the tacit exception of useless Pandas columns as described here: Pandas Columns of inconvenience
He claims that he silently disables columns if an aggregate function cannot be applied to a column.
Consider the following example:
I have a data frame:
df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}}) df: ABC 0 foo -1.131345 -0.919854 1 bar -0.089329 -0.042379 2 foo 0.337863 1.247642 3 bar -0.945867 -0.009920 4 foo -0.932132 0.290213 5 bar 1.956030 0.495767 6 foo 0.017587 0.362949 7 foo -0.016692 1.548106
Let me combine the two columns B and C and convert to numpy ndarray:
df = df.assign(D=df[['B', 'C']].values.tolist()) df['D'] = df['D'].apply(np.array) df: ABCD 0 foo -1.131345 -0.919854 [-1.131345, -0.9198540000000001] 1 bar -0.089329 -0.042379 [-0.08932899999999999, -0.042379] 2 foo 0.337863 1.247642 [0.337863, 1.247642] 3 bar -0.945867 -0.009920 [-0.945867, -0.00992] 4 foo -0.932132 0.290213 [-0.932132, 0.290213] 5 bar 1.956030 0.495767 [1.95603, 0.495767] 6 foo 0.017587 0.362949 [0.017587000000000002, 0.36294899999999997] 7 foo -0.016692 1.548106 [-0.016692, 1.548106]
Now I can apply the average to column D:
print(df['D'].mean()) print(df['B'].mean()) print(df['C'].mean()) [-0.10048563 0.3715655 ] -0.100485625 0.3715655
But when I try to group A and get the average, column D drops out:
df.groupby('A').mean() BC A bar 0.306945 0.147823 foo -0.344944 0.505811
My question is: why is column D exception thrown out although aggregate function can be applied successfully?
And also, in general, how can I use aggregate functions like mean or sum when a particular column of interest is a numpy array?