Show first 10 rows of pandas dataframe multi-index

I have a multi-level pandas index DataFramewhere the first level yearand the second level username. I have only one column that is already sorted in descending order. I want to show the first 2 rows of each level of index 0.

What I have :

               count
year username                
2010 b         677
     a         505
     c         400
     d         300
 ...
2014 a         100
     b         80

What I want :

               count
year username                
2010 b         677
     a         505
2011 c         677
     d         505
2012 e         677
     f         505
2013 g         677
     i         505
2014 h         677
     j         505
+4
source share
2 answers

Here's the answer. Maybe there is a better way to do this (with indexing?), But I know this works. The principle seems complicated, but rather simple:

  • Indicate DataFrameby year and username.
  • DataFrame , (=0)
  • DataFrame, groupby ( )
    • sort_index(by='count')DataFrame
    • top ( 2), ([-top:]). tail (tail(top)).
  • , droplevel(0)

# Test data    
df = pd.DataFrame({'year': [2010, 2010, 2010, 2011,2011,2011, 2012, 2012, 2013, 2013, 2014, 2014],
                  'username': ['b','a','a','c','c','d','e','f','g','i','h','j'],
                  'count': [400, 505, 678, 677, 505, 505, 677, 505, 677, 505, 677, 505]})
df = df.set_index(['year','username'])

top = 2
df = df.groupby(level=0).apply(lambda df: df.sort_index(by='count')[-top:])
df.index = df.index.droplevel(0)
df

               count
year username       
2010 a           505
     a           678
2011 d           505
     c           677
2012 f           505
     e           677
2013 i           505
     g           677
2014 j           505
     h           677
+4

, groupby, . .

df = pd.DataFrame({'year': [2010, 2010, 2010, 2011,2011,2011, 2012, 2012, 2013, 2013, 2014, 2014],
              'username': ['b','a','a','c','c','d','e','f','g','i','h','j'],
              'count': [400, 505, 678, 677, 505, 505, 677, 505, 677, 505, 677, 505]})
df = df.set_index(['year','username'])

, DataFrame .

df = df.sort_index(level=[0,1])

df
                count
year    username    
2010    a       505
        a       678
        b       400
2011    c       677
        c       505
        d       505
2012    e       677
        f       505
2013    g       677
        i       505
2014    h       677
        j       505

:

def head_mi(df, n1=5, n2=2):

    #get top n of outer index
    top_lev_0 = df.index.levels[0].values[:n1] 

    #get top n of inner index
    top_lev_1 = [df.loc[ind].index.values[:n2] for ind in top_lev_0 ] 
    #top_lev_1 is a list of the inner index values

    #iterate over outer index and get slice from inner index
    acc = []
    for count0, ind0 in enumerate(top_lev_0):
        acc.append(df.loc[(top_lev_0[count0], slice(top_lev_1[count0][0], top_lev_1[count0][-1])),:]) 

    return pd.concat(acc)

head_mi(df)  

:

                count
year    username    
2010    a       505
        a       678
2011    c       677
        c       505
2012    e       677
        f       505
2013    g       677
        i       505
2014    h       677
        j       505 
+1

Source: https://habr.com/ru/post/1607186/


All Articles