For the next data frame:
import pandas as pd
p1 = {'name': 'willy', 'age': 11, 'interest': "Lego"}
p2 = {'name': 'willy', 'age': 11, 'interest': "games"}
p3 = {'name': 'zoe', 'age': 9, 'interest': "cars"}
df = pd.DataFrame([p1, p2, p3])
df
age interest name
0 11 Lego willy
1 11 games willy
2 9 cars zoe
I want to know the sum of the interests of each person and allow each person to show only once in the list. I do the following:
Interests = df[['age', 'name', 'interest']].groupby(['age' , 'name']).count()
Interests.reset_index(inplace=True)
Interests.sort('interest', ascending=False, inplace=True)
Interests
age name interest
1 11 willy 2
0 9 zoe 1
It works, but I feel like I'm doing it wrong. Now I am using the โinterestโ column to display my sum values, which are fine, but as I said, I expect there will be a better way to do this.
I saw a lot of questions about counting / amounting in Pandas, but for me the part in which I leave โduplicatesโ is the key.
source
share