How to group by column and copy all group values into one row in pandas?

Question

How to group by column and copy all group values into one row in pandas?

This is a sample of my dataset:

Consumer_num | billed_units  
29           | 984
29           | 1244
29           | 2323
29           | 1232
29           | 1150
30           | 3222
30           | 1444
30           | 2124

I want to group by user_num and then add all the values (billed_units) of each group to the new columns. So my required result is:

Consumer_num | month 1 | month 2 | month 3 | month 4  | month 5  
29           | 984     | 1244     | 2323    | 1232     | 1150 
30           | 3222    | 1444     | 2124    | NaN      | NaN

This is what I have done so far:

group = df.groupby('consumer_num')['billed_units'].unique()
group[group.apply(lambda x: len(x)>1)]
df = group.to_frame()
print df

Conclusion:

Consumer_num | billed_units  
29           | [984,1244,2323,1232,1150]
30           | [3222,1444,2124]

I do not know if my approach is right. If this is correct, then I would like to know how I can separate the billed_units of each consumer, and then add to the new columns, as I showed in my required output. Or is there a better method to achieve my desired result?

+4

python pandas

user7018778 Apr 14 '17 at 3:51

source share

2 answers

pivot

In [70]: dfm = df.assign(m=df.groupby('Consumer_num').cumcount().add(1))

In [71]: dfm.pivot('Consumer_num', 'm', 'billed_units').add_prefix('month ')
Out[71]:
m             month 1  month 2  month 3  month 4  month 5
Consumer_num
29              984.0   1244.0   2323.0   1232.0   1150.0
30             3222.0   1444.0   2124.0      NaN      NaN

In [75]: df
Out[75]:
   Consumer_num  billed_units
0            29           984
1            29          1244
2            29          2323
3            29          1232
4            29          1150
5            30          3222
6            30          1444
7            30          2124

In [76]: dfm
Out[76]:
   Consumer_num  billed_units  m
0            29           984  1
1            29          1244  2
2            29          2323  3
3            29          1232  4
4            29          1150  5
5            30          3222  1
6            30          1444  2
7            30          2124  3

0

Zero 15 . '17 7:27

piRSquared · Accepted Answer · 2017-04-14T03:58:44+0000

decision

c = 'Consumer_num'
m = 'month {}'.format
df.set_index(
    [c, df.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).reset_index()

   Consumer_num  month 1  month 2  month 3  month 4  month 5
0            29    984.0   1244.0   2323.0   1232.0   1150.0
1            30   3222.0   1444.0   2124.0      NaN      NaN

how it works

put 'Consumer_num'in a variable cfor convenience
mapper m
, pd.MultiIndex
- groupby cumcount unstack
- unstack
, mapper

iloc. 3 . 5. .

c = 'Consumer_num'
m = 'month {}'.format
df.set_index(
    [c, df.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).iloc[:, :3].reset_index()
#                                         ^..........^

   Consumer_num  month 1  month 2  month 3
0            29    984.0   1244.0   2323.0
1            30   3222.0   1444.0   2124.0

c = 'Consumer_num'
m = 'month {}'.format
d1 = df.groupby(c).head(3)  # pre-process and take just first 3
d1.set_index(
    [c, d1.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).reset_index()

How to group by column and copy all group values ​​into one row in pandas?

More articles:

How to group by column and copy all group values into one row in pandas?