Python: Pandas mistakenly excluding a column in a group

Question

Python: Pandas mistakenly excluding a column in a group

I came across the tacit exception of useless Pandas columns as described here: Pandas Columns of inconvenience

He claims that he silently disables columns if an aggregate function cannot be applied to a column.

Consider the following example:

I have a data frame:

df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}}) df: ABC 0 foo -1.131345 -0.919854 1 bar -0.089329 -0.042379 2 foo 0.337863 1.247642 3 bar -0.945867 -0.009920 4 foo -0.932132 0.290213 5 bar 1.956030 0.495767 6 foo 0.017587 0.362949 7 foo -0.016692 1.548106

Let me combine the two columns B and C and convert to numpy ndarray:

 df = df.assign(D=df[['B', 'C']].values.tolist()) df['D'] = df['D'].apply(np.array) df: ABCD 0 foo -1.131345 -0.919854 [-1.131345, -0.9198540000000001] 1 bar -0.089329 -0.042379 [-0.08932899999999999, -0.042379] 2 foo 0.337863 1.247642 [0.337863, 1.247642] 3 bar -0.945867 -0.009920 [-0.945867, -0.00992] 4 foo -0.932132 0.290213 [-0.932132, 0.290213] 5 bar 1.956030 0.495767 [1.95603, 0.495767] 6 foo 0.017587 0.362949 [0.017587000000000002, 0.36294899999999997] 7 foo -0.016692 1.548106 [-0.016692, 1.548106]

Now I can apply the average to column D:

 print(df['D'].mean()) print(df['B'].mean()) print(df['C'].mean()) [-0.10048563 0.3715655 ] -0.100485625 0.3715655

But when I try to group A and get the average, column D drops out:

 df.groupby('A').mean() BC A bar 0.306945 0.147823 foo -0.344944 0.505811

My question is: why is column D exception thrown out although aggregate function can be applied successfully?

And also, in general, how can I use aggregate functions like mean or sum when a particular column of interest is a numpy array?

+5

python pandas

Vikash B Feb 22 '18 at 8:29

source share

1 answer

jezrael · Answer 1 · 2018-02-22T08:47:01+0000

Is this possible, but if-else is required in the user-defined function:

 def f(x): a = x.mean() return a if isinstance(a, (float, int)) else list(a) df = df.groupby('A').agg(f) print (df) BCD A bar 0.306945 0.147823 [0.306944666667, 0.147822666667] foo -0.344944 0.505811 [-0.3449438, 0.5058112]

 df = df.groupby('A').agg(lambda x: x.mean()) print (df) BCD A bar 0.306945 0.147823 NaN foo -0.344944 0.505811 NaN

Python: Pandas mistakenly excluding a column in a group

More articles: