Pandas column data frame sum and result collection

Question

Pandas column data frame sum and result collection

For the next data frame:

import pandas as pd
p1 = {'name': 'willy', 'age': 11, 'interest': "Lego"}
p2 = {'name': 'willy', 'age': 11, 'interest': "games"}
p3 = {'name': 'zoe', 'age': 9, 'interest': "cars"}
df = pd.DataFrame([p1, p2, p3])
df

    age interest    name
0   11  Lego        willy
1   11  games       willy
2   9   cars        zoe

I want to know the sum of the interests of each person and allow each person to show only once in the list. I do the following:

Interests = df[['age', 'name', 'interest']].groupby(['age' , 'name']).count()
Interests.reset_index(inplace=True)
Interests.sort('interest', ascending=False, inplace=True)
Interests

    age name    interest
1   11  willy   2
0   9   zoe     1

It works, but I feel like I'm doing it wrong. Now I am using the “interest” column to display my sum values, which are fine, but as I said, I expect there will be a better way to do this.

I saw a lot of questions about counting / amounting in Pandas, but for me the part in which I leave “duplicates” is the key.

+4

python pandas

Lam Nov 03 '15 at 16:37

source share

2 answers

In [2]: df
Out[2]: 
   age interest   name
0   11     Lego  willy
1   11    games  willy
2    9     cars    zoe

In [3]: for name,group in df.groupby('name'):
   ...:     print name
   ...:     print group.interest.count()
   ...:     
willy
2
zoe
1

0

Angelo 03 . '15 16:46

Andy Hayden · Accepted Answer · 2015-11-03T16:44:23+0000

( ), , -NaN .

In [11]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size()
Out[11]:
age  name
9    zoe      1
11   willy    2
dtype: int64

In [12]: df[['age', 'name', 'interest']].groupby(['age' , 'name']).size().reset_index(name='count')
Out[12]:
   age   name  count
0    9    zoe      1
1   11  willy      2

Pandas column data frame sum and result collection

More articles: