Slicing Pandas Dataframe according to the number of rows

Question

Slicing Pandas Dataframe according to the number of rows

I suppose this is something fairly simple, but I cannot find how to do it. I was looking for tutorials and stackoverflow.

Suppose I have a dataframe df, for example:

Group   Id_In_Group   SomeQuantity
1        1              10
1        2              20
2        1               7
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28
4        1               1
4        2              18
4        3              14
4        4               7
5        1              36

I would like to select only rows that have at least 4 objects in the group (so there are at least 4 rows with the same “group” number) and for which SomeQuantity for the 4th object is larger in the group in ascending order SomeQuantity 20 (for example).

In this Dataframe, for example, it would only return the 3rd group, since it has 4 (> = 4) elements, and the 4th SomeQuantity (after sorting) is 22 (> = 20), so it should build a dataframe :

Group   Id_In_Group   SomeQuantity
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28

(being or not sorted by SomeQuantity, regardless).

Can anyone be kind to help me? :)

+4

python pandas slice dataframe

Matt 06 . '17 11:58

3

.groupby() + .filter():

In [66]: df.groupby('Group').filter(lambda x: len(x) >= 4 and x['SomeQuantity'].max() >= 20)
Out[66]:
   Group  Id_In_Group  SomeQuantity
3      3            1            16
4      3            2            22
5      3            3             5
6      3            4            12
7      3            5            28

+5

MaxU 06 . '17 12:00

, . , .

"4 "

import collections

groups = list({k for k, v in collections.Counter(df.Group).items() if v > 3} );groups

Out:[3, 4]

df, :

df2 = df[df.Group.isin(groups)]

"4th SomeQuantity ( ) 22 ( >= 20)"

 df3 = df2.sort_values(by='SomeQuantity',ascending=False)

( ...)

df3.groupby('Group').filter(lambda grp: any(grp.sort_values('SomeQuantity').iloc[3] >= 20)).sort_index()

    Group   Id_In_Group SomeQuantity
  3    3        1       16
  4    3        2       22
  5    3        3       5
  6    3        4       12
  7    3        5       28

+1

ade1e 06 . '17 14:02

Nickil Maveli · Accepted Answer · 2017-01-06T13:39:32+0000

map, value_counts, groupby, filter:

(df[df.Group.map(df.Group.value_counts().ge(4))]
   .groupby('Group')
   .filter(lambda x: np.any(x['SomeQuantity'].sort_values().iloc[3] >= 20)))

:

value_counts, , Group.

>>> df.Group.value_counts()

3    5
4    4
1    2
5    1
2    1
Name: Group, dtype: int64

map, ( , ), DF

>>> df.Group.map(df.Group.value_counts())

0     2
1     2
2     1
3     5
4     5
5     5
6     5
7     5
8     4
9     4
10    4
11    4
12    1
Name: Group, dtype: int64

, 4 , , DF.

>>> df[df.Group.map(df.Group.value_counts().ge(4))]   

    Group  Id_In_Group  SomeQuantity
3       3            1            16
4       3            2            22
5       3            3             5
6       3            4            12
7       3            5            28
8       4            1             1
9       4            2            28
10      4            3            14
11      4            4             7

groupby.filter , , , , , 20. np.any , .

>>> df[df.Group.map(df.Group.value_counts().ge(4))]         \
      .groupby('Group').apply(lambda x: x['SomeQuantity'].sort_values().iloc[3])

 Group
3    22
4    18
dtype: int64

.iloc[3], 0 .

Slicing Pandas Dataframe according to the number of rows

More articles: