How to count the number of records related to each group?

This is my data frame:

df = 

GROUP    GRADE   TOTAL_SERVICE_TIME    TOTAL_WAIT_TIME
AAA      1       45                    20
AAA      4       40                    23
AAA      5       35                    21
BBB      2       30                    24
BBB      3       55                    22

I want to group the records GROUand GRADE, estimate the average value TOTAL_SERVICE_TIMEand mean TOTAL_WAIT_TIMEfor each group, as well as count the number of records that belong to each group.

I do not know how to do the counting:

output = df.groupby(['GROUP','GRADE'])
           .agg({'TOTAL_SERVICE_TIME' : 'mean', 'TOTAL_WAIT_TIME' : 'mean'})
           .value_counts()
           .reset_index()

I also tried adding , 'COUNT' : 'count', but the column COUNTshould already exist.

+4
source share
2 answers

You are close, and the documentation shines on agg:

df.groupby(['GROUP','GRADE']).agg({'TOTAL_SERVICE_TIME' : 'mean',
                                   'TOTAL_WAIT_TIME' : ['mean', 'count']})
Out[45]: 
            TOTAL_WAIT_TIME       TOTAL_SERVICE_TIME
                       mean count               mean
GROUP GRADE                                         
AAA   1                  20     1                 45
      4                  23     1                 40
      5                  21     1                 35
BBB   2                  24     1                 30
      3                  22     1                 55
+2
source

I would like to extend this great @Boud answer with another example where you can specify column names:

In [57]: funcs = {
    ...:   'TOTAL_SERVICE_TIME': {'mean_service':'mean', 'count_service':'size'},
    ...:   'TOTAL_WAIT_TIME' : {'mean_wait':'mean', 'count_wait':'size'}
    ...: }
    ...:

In [58]: df
Out[58]:
  GROUP  GRADE  TOTAL_SERVICE_TIME  TOTAL_WAIT_TIME
0   AAA      1                  45               20
1   AAA      1                 100              100
2   AAA      4                  40               23
3   AAA      5                  35               21
4   BBB      2                  30               24
5   BBB      3                  55               22

In [59]: df.groupby(['GROUP','GRADE']).agg(funcs)
Out[59]:
            TOTAL_SERVICE_TIME               TOTAL_WAIT_TIME
                  mean_service count_service      count_wait mean_wait
GROUP GRADE
AAA   1                   72.5             2               2        60
      4                   40.0             1               1        23
      5                   35.0             1               1        21
BBB   2                   30.0             1               1        24
      3                   55.0             1               1        22

:

x = df.groupby(['GROUP','GRADE']).agg(funcs)
x.columns = x.columns.droplevel(0)


In [63]: x
Out[63]:
             mean_service  count_service  count_wait  mean_wait
GROUP GRADE
AAA   1              72.5              2           2         60
      4              40.0              1           1         23
      5              35.0              1           1         21
BBB   2              30.0              1           1         24
      3              55.0              1           1         22

In [64]: x.reset_index()
Out[64]:
  GROUP  GRADE  mean_service  count_service  count_wait  mean_wait
0   AAA      1          72.5              2           2         60
1   AAA      4          40.0              1           1         23
2   AAA      5          35.0              1           1         21
3   BBB      2          30.0              1           1         24
4   BBB      3          55.0              1           1         22
+1

Source: https://habr.com/ru/post/1668420/


All Articles