How to collect column values ​​after complex pandas aggregation

I am doing some non-trivial aggregation, as in the following:

aggregations = {
    'x_TmId': { 
        'Trays': 'nunique',  
        'Orderlines': 'count', 
    },
    'x_Qty': 'sum'
}

  newdf = pick.groupby(['Date','x_OrderId']).agg(aggregations).reset_index(True)

at this point, aggregated data columns can be named as usual

  newdf.columns

but returns something that I have not encountered before: a MultiIndex object:

MultiIndex(levels=[['x_TmId', 'x_Qty', 'x_OrderId'], ['Orderlines', 'Trays', 'sum', '']],
           labels=[[2, 0, 0, 1], [3, 0, 1, 2]])

At this point, I understand that I don’t know how to name the new variables "sum" as an example? There should be some kind of similar question in stackoverflow, but it hasn't found it yet.

+4
source share
1 answer

I think the simplest is tupleto select MultiIndexin the columns:

a = df[('x_Qty', 'sum')]

Another solution with slicers:

idx = pd.IndexSlice
print (newdf.loc[:, idx['x_Qty', 'sum']])

But for pandas 0.20.1we get Warning:

FutureWarning: dict   return super (DataFrameGroupBy, self).aggregate(arg, * args, ** kwargs)

rename:

aggregations = {
    'x_TmId': ['nunique', 'count'],
    'x_Qty': 'sum'
}

newdf = pick.groupby(['Date','x_OrderId']).agg(aggregations).reset_index(True)
d = {'nunique':'Trays','count':'Orderlines'}
newdf = newdf.rename(columns=d)
print (newdf)
           x_OrderId x_TmId            x_Qty
                      Trays Orderlines   sum
Date                                        
2017-10-01         9      1          1     4
2017-10-02         4      1          1     1
2017-10-03         0      1          1     3
2017-10-04         1      1          1     6
2017-10-05         9      1          1     5
2017-10-06         0      1          1     3
2017-10-07         1      1          1     9
2017-10-08         8      1          1     6
2017-10-09         9      1          1     9
2017-10-10         0      1          1     1

MultiIndex :

aggregations = {
    'x_TmId': ['nunique', 'count'],
    'x_Qty': 'sum'
}

newdf = pick.groupby(['Date','x_OrderId']).agg(aggregations)
newdf.columns = newdf.columns.map('_'.join)
d = {'x_TmId_nunique':'Trays','x_TmId_count':'Orderlines'}
newdf = newdf.reset_index().rename(columns=d)
print (newdf)
        Date  x_OrderId  Trays  Orderlines  x_Qty_sum
0 2017-10-01          9      1           1          4
1 2017-10-02          4      1           1          1
2 2017-10-03          0      1           1          3
3 2017-10-04          1      1           1          6
4 2017-10-05          9      1           1          5
5 2017-10-06          0      1           1          3
6 2017-10-07          1      1           1          9
7 2017-10-08          8      1           1          6
8 2017-10-09          9      1           1          9
9 2017-10-10          0      1           1          1

print (newdf['x_Qty_sum'])
0    4
1    1
2    3
3    6
4    5
5    3
6    9
7    6
8    9
9    1
Name: x_Qty_sum, dtype: int32
+3

Source: https://habr.com/ru/post/1677071/


All Articles