Sort Pandas DataFrames

I have a Pandas Dataframe with machine data. I want to find the best-selling two Models for each Creator, and then rank the Creators descending.

Maker Model No Sold(,000s) Ford Kuga 35 Ford Focus 47 Ford Ka 31 Ford Fiesta 68 Ford Mondeo 55 Ford S-Max 34 Ford Galaxy 23 Nissan Leaf 28 Nissan Micra 31 Nissan Note 43 Nissan Pulsar 23 Nissan Juke 57 Nissan Qashqai 62 Nissan X-Trail 38 Honda Jazz 24 Honda Civic 32 Honda HRV 33 Honda CRV 29 Honda Accord 30 Honda NSX 15 Toyota Aygo 44 Toyota Auris 45 Toyota Avensis 35 Toyota Prius 32 Toyota Rav4 29 Toyota Land Cruiser 14 Citroen C1 40 Citroen C3 25 Citroen C4 46 Citroen DS3 35 Citroen DS4 31 Citroen DS5 25 Audi A1 23 Audi A3 47 Audi A4 30 Audi A6 20 Audi A8 18 BMW 1 Series 36 BMW 2 Series 20 BMW 3 Series 53 BMW 4 Series 21 BMW 5 Series 27 BMW 6 Series 24 BMW 7 Series 16 

Sorry, not sure how to place the Dataframe here.

+6
source share
3 answers

use groupby + nlargest

 df.set_index('Model').groupby('Maker')['No Sold(,000s)'].nlargest(2) Maker Model Audi A3 47 A4 30 Citroen C4 46 C1 40 Ford Fiesta 68 Mondeo 55 Honda HRV 33 Civic 32 Nissan Qashqai 62 Juke 57 Toyota Auris 45 Aygo 44 Name: No Sold(,000s), dtype: int64 
+5
source

Alternative solution:

 In [222]: df.sort_values(['Maker', 'No Sold(,000s)'], ascending=[1,0]) \ .groupby('Maker', as_index=False).head(2) Out[222]: Maker Model No Sold(,000s) 33 Audi A3 47 34 Audi A4 30 39 BMW 3 Series 53 37 BMW 1 Series 36 28 Citroen C4 46 26 Citroen C1 40 3 Ford Fiesta 68 4 Ford Mondeo 55 16 Honda HRV 33 15 Honda Civic 32 12 Nissan Qashqai 62 11 Nissan Juke 57 21 Toyota Auris 45 20 Toyota Aygo 44 

PS, please remember: @piRSquared solution is more idiomatic and should be faster

+3
source

I think you could also:

 df[df.groupby(by=['maker'])["no sold(000's)"].rank() <= 2] 
+1
source

Source: https://habr.com/ru/post/1014911/


All Articles