How to sum nlargest () integers in groupby

Question

How to sum nlargest () integers in groupby

I have a dataframe like this:

Index STNAME COUNTY COUNTY_POP 0 AL 0 100 1 AL 1 150 2 AL 3 200 3 AL 5 50 ... 15 CA 0 300 16 CA 1 200 17 CA 3 250 18 CA 4 350

I want to summarize the three largest integers from COUNTY_POP for each state. So far I have had:

  In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3) Out[]: Index STNAME COUNTY COUNTY_POP 0 AL 0 100 1 AL 1 150 2 AL 3 200 ... 15 CA 0 300 17 CA 3 250 18 CA 4 350

However, when I add the .sum () operation to the above code, I get the following output.

  In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3).sum() Out[]: 1350

I am relatively new to Python and Pandas. If anyone could explain what causes this and how to fix it, I would really appreciate it!

+5

python pandas group-by dataframe

IMLD Nov 09 '16 at 10:53

source share

2 answers

presort and slice ... a little faster

 df.sort_values('COUNTY_POP').groupby('STNAME').COUNTY_POP \ .apply(lambda x: x.values[-3:].sum()) STNAME AL 450 CA 900 Name: COUNTY_POP, dtype: int64

+2

piRSquared Nov 09 '16 at 23:40

source share

Maxu · Accepted Answer · 2016-11-09T22:56:05+0000

Is this what you want?

 In [25]: df.groupby('STNAME')['COUNTY_POP'].agg(lambda x: x.nlargest(3).sum()) Out[25]: STNAME AL 450 CA 900 Name: COUNTY_POP, dtype: int64

How to sum nlargest () integers in groupby

More articles: