Subtract average subgroups from individuals without resorting to a cycle

I have a dataframe with multiple columns, two of which group variables.

>>> df2
   Groupvar1  Groupvar2         x         y         z
0          A          1  0.726317  0.574514  0.700475
1          A          2  0.422089  0.798931  0.191157
2          A          3  0.888318  0.658061  0.686496
....
13         B          2  0.978920  0.764266  0.673941
14         B          3  0.759589  0.162488  0.698958

and I want to create a new dataframe that contains the difference between each datapoint in origianl df and the average value corresponding to its subgroup.

So, to start by creating a new df with grouped averages:

>>> grp_vars = ['Groupvar1','Groupvar2']
>>> df2_grp = df2.groupby(grp_vars)
>>> df2_grp_avg = df2_grp.mean()
>>> df2_grp_avg
                            x         y         z
Groupvar1 Groupvar2                              
A         1          0.364533  0.645237  0.886286
          2          0.325533  0.500077  0.246287
          3          0.796326  0.496950  0.510085
          4          0.774854  0.688732  0.487547
B         1          0.743783  0.452482  0.612006
          2          0.575687  0.396902  0.446126
          3          0.473152  0.476379  0.508060
          4          0.434320  0.406458  0.382187

and in the new dtaframe I want to keep the deltas defined as:

delta = individual value - the average value of a subgroup of this individual is a member

Now, it’s clear to me how difficult this is (for the loop), but I suppose there should be a more elegant solution. Recognize any advice on finding this more elegant solution. TIA.

+8
1

.groupby(...).transform:

>>> demean = lambda df: df - df.mean()
>>> df.groupby(['Groupvar1', 'Groupvar2']).transform(demean)

ant, pd.concat .

+12

Source: https://habr.com/ru/post/1548140/


All Articles