Pandas: conditional group calculations

Say I have a table with a key (for example, a customer identifier) ​​and two numeric columns C1 and C2. I would like to group the rows with a key (client) and run some aggregators, such as the sum and value in my columns. After calculating the group aggregators, I would like to return the results of each row of the client in the DataFrame (since some user-defined functions are added to each row).

I see that I can do something like
df['F1'] = df.groupby(['Key'])['C1'].transform(np.sum)
if I want to aggregate only one column and be able to add the result back to the DataFrame.

Can I make this conditional - can I add column C1 to the group only for rows whose column C2 is equal to some number X and you can still add the results back to the DataFrame?

How to start an aggregator by a combination of lines, for example:
np.sum (C1 + C2)?

What will be the easiest and most elegant way to implement it? What is the most efficient way to do this? Can these aggregations be performed in the same way?

Thanks in advance.

+4
source share
2 answers

Here's some tweaking of some dummy data.

In [81]: df = pd.DataFrame({'Key': ['a','a','b','b','c','c'], 
                            'C1': [1,2,3,4,5,6],  
                            'C2': [7,8,9,10,11,12]})
In [82]: df['F1'] = df.groupby('Key')['C1'].transform(np.sum)

In [83]: df
Out[83]: 
   C1  C2 Key  F1
0   1   7   a   3
1   2   8   a   3
2   3   9   b   7
3   4  10   b   7
4   5  11   c  11
5   6  12   c  11

If you want to make a conditional GroupBy, you can just filter the dataframe as it passes to .groubpy. For example, if you need the sum of the group "C1", if C2 is less than 8 or more than 9.

In [87]: cond = (df['C2'] < 8) | (df['C2'] > 9)

In [88]: df['F2'] = df[cond].groupby('Key')['C1'].transform(np.sum)

In [89]: df
Out[89]: 
   C1  C2 Key  F1  F2
0   1   7   a   3   1
1   2   8   a   3 NaN
2   3   9   b   7 NaN
3   4  10   b   7   4
4   5  11   c  11  11
5   6  12   c  11  11

This works because the operation transformsaves the index, so it will still correctly match the original framework.

, , - ? - - .

In [93]: gb = df.groupby('Key')

In [94]: df['C1+C2'] = gb['C1'].transform(np.sum) + gb['C2'].transform(np.sum)

: . , .

df['C1_C2'] = gb[['C1','C2']].apply(lambda x: pd.DataFrame(x.sum().sum(), index=x.index, columns=['']))
+5

, apply() transform(), DataFrame, , . , .

df = pd.DataFrame({'Key': ['a','a','b','b','c','c'],
                            'C1': [1,2,3,4,5,6],
                            'C2': [7,8,9,10,11,12]})

# Group g will be given as a DataFrame
def group_feature_extractor(g):
    feature_1 = (g['C1'] + g['C2']).sum()

    even_C1_filter = g['C1'] % 2 == 0
    feature_2 = g[even_C1_filter]['C2'].sum()

    return pd.Series([feature_1, feature_2], index = ['F1', 'F2'])

# Group once
group = df.groupby(['Key'])

# Extract features from each group
group_features = group.apply(group_feature_extractor)

#
# Join with the input data frame ...
#

>

0

Source: https://habr.com/ru/post/1544798/


All Articles