Group DataFrame, apply the function with inputs, then add the result back to the original

Question

Group DataFrame, apply the function with inputs, then add the result back to the original

There is no way to find this question anywhere, so just try here:

What I'm trying to do is basically modify an existing DataFrame using group functionality and a self-written function:

benchmark =

x    y    z    field_1

1    1    3    a
1    2    5    b
9    2    4    a
1    2    5    c
4    6    1    c

What I want to do is groupby field_1, apply the function using certain columns as input, in this case the columns xand y, and then add the result to the original DataFrame benchmarkas a new column called new_field. The function itself depends on the value of field_1, i.e. field_1=awill give a different result compared to field_1=betc. (Consequently, the grouping will begin with).

The pseudocode will look something like this:

1. grouped_data = benchmark.groupby(['field_1'])
2. apply own_function to grouped_data; with inputs ('x', 'y', grouped_data)
3. add back result from function to benchmark as column 'new_field'

Thank,

:

benchmark =

x    y    z    field_1

1    1    3    a
1    2    5    b
9    2    4    a
1    2    5    c
4    6    1    c

:

DataFrame separate_data, x,

separate_data =

x    a    b    c

1    1    3    7
2    2    5    6
3    2    4    4
4    2    5    9
5    6    1    10

benchmark DataFrame. separate_data, , field_1 benchmark (.. (a,b,c) ). x- benchmark.

:

benchmark =

x    y    z    field_1  field_new

1    1    3    a        interpolate using separate_data with x=1 and col=a
1    2    5    b        interpolate using separate_data with x=1 and col=b
9    2    4    a        ... etc
1    2    5    c        ...
4    6    1    c        ...

?

+4

python pandas dataframe pandas-groupby

gussilago 05 . '17 15:45

3

- :

groups = benchmark.groupby(benchmark["field_1"])    
benchmark = benchmark.join(groups.apply(your_function), on="field_1")

_ , , , . , ..

apply. join.

+1

Dan Carter 05 . '17 16:01

Here is a working example:

# Sample function that sums x and y, then append the field as string.
def func(x, y, z):
    return (x + y).astype(str) + z

benchmark['new_field'] = benchmark.groupby('field_1')\
                                  .apply(lambda x: func(x['x'], x['y'], x['field_1']))\
                                  .reset_index(level = 0, drop = True)

Result:

benchmark
Out[139]: 
   x  y  z field_1 new_field
0  1  1  3       a        2a
1  1  2  5       b        3b
2  9  2  4       a       11a
3  1  2  5       c        3c
4  4  6  1       c       10c

0

Flab Jul 05 '17 at 16:01

source share

jezrael · Accepted Answer · 2017-07-05T15:58:43+0000

EDIT:

, separate_data set_index + stack, rename_axis Serie .

groupby .

join benchmark :

separate_data1 =separate_data.set_index('x').stack().rename_axis(('x','field_1')).rename('d')
print (separate_data1)
x  field_1
1  a           1
   b           3
   c           7
2  a           2
   b           5
   c           6
3  a           2
   b           4
   c           4
4  a           2
   b           5
   c           9
5  a           6
   b           1
   c          10
Name: d, dtype: int64

, , x field_1 :

def func(x):
    #sample function   
    return x / 2 + x ** 2


separate_data1 = separate_data1.groupby(level=['x','field_1']).apply(func)
print (separate_data1)
x  field_1
1  a            1.5
   b           10.5
   c           52.5
2  a            5.0
   b           27.5
   c           39.0
3  a            5.0
   b           18.0
   c           18.0
4  a            5.0
   b           27.5
   c           85.5
5  a           39.0
   b            1.5
   c          105.0
Name: d, dtype: float64


benchmark = benchmark.join(separate_data1, on=['x','field_1'])
print (benchmark)

   x  y  z field_1     d
0  1  1  3       a   1.5
1  1  2  5       b  10.5
2  9  2  4       a   NaN
3  1  2  5       c  52.5
4  4  6  1       c  85.5

, transform, , .

apply:

df1 = benchmark.groupby(['field_1']).apply(func)

, . join ( left join) map.

.

, DataFrame .

Group DataFrame, apply the function with inputs, then add the result back to the original

:

More articles: