Slicing Pandas Dataframe according to the number of rows

I suppose this is something fairly simple, but I cannot find how to do it. I was looking for tutorials and stackoverflow.

Suppose I have a dataframe df, for example:

Group   Id_In_Group   SomeQuantity
1        1              10
1        2              20
2        1               7
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28
4        1               1
4        2              18
4        3              14
4        4               7
5        1              36

I would like to select only rows that have at least 4 objects in the group (so there are at least 4 rows with the same “group” number) and for which SomeQuantity for the 4th object is larger in the group in ascending order SomeQuantity 20 (for example).

In this Dataframe, for example, it would only return the 3rd group, since it has 4 (> = 4) elements, and the 4th SomeQuantity (after sorting) is 22 (> = 20), so it should build a dataframe :

Group   Id_In_Group   SomeQuantity
3        1              16
3        2              22
3        3               5
3        4              12
3        5              28

(being or not sorted by SomeQuantity, regardless).

Can anyone be kind to help me? :)

+4
3

map, value_counts, groupby, filter:

(df[df.Group.map(df.Group.value_counts().ge(4))]
   .groupby('Group')
   .filter(lambda x: np.any(x['SomeQuantity'].sort_values().iloc[3] >= 20)))

enter image description here


:

value_counts, , Group.

>>> df.Group.value_counts()

3    5
4    4
1    2
5    1
2    1
Name: Group, dtype: int64

map, ( , ), DF

>>> df.Group.map(df.Group.value_counts())

0     2
1     2
2     1
3     5
4     5
5     5
6     5
7     5
8     4
9     4
10    4
11    4
12    1
Name: Group, dtype: int64

, 4 , , DF.

>>> df[df.Group.map(df.Group.value_counts().ge(4))]   

    Group  Id_In_Group  SomeQuantity
3       3            1            16
4       3            2            22
5       3            3             5
6       3            4            12
7       3            5            28
8       4            1             1
9       4            2            28
10      4            3            14
11      4            4             7

groupby.filter , , , , , 20. np.any , .

>>> df[df.Group.map(df.Group.value_counts().ge(4))]         \
      .groupby('Group').apply(lambda x: x['SomeQuantity'].sort_values().iloc[3])

 Group
3    22
4    18
dtype: int64

.iloc[3], 0 .

+3

.groupby() + .filter():

In [66]: df.groupby('Group').filter(lambda x: len(x) >= 4 and x['SomeQuantity'].max() >= 20)
Out[66]:
   Group  Id_In_Group  SomeQuantity
3      3            1            16
4      3            2            22
5      3            3             5
6      3            4            12
7      3            5            28
+5

, . , .

"4 "

import collections

groups = list({k for k, v in collections.Counter(df.Group).items() if v > 3} );groups

Out:[3, 4]

df, :

df2 = df[df.Group.isin(groups)]

"4th SomeQuantity ( ) 22 ( >= 20)"

 df3 = df2.sort_values(by='SomeQuantity',ascending=False)

( ...)

df3.groupby('Group').filter(lambda grp: any(grp.sort_values('SomeQuantity').iloc[3] >= 20)).sort_index()

    Group   Id_In_Group SomeQuantity
  3    3        1       16
  4    3        2       22
  5    3        3       5
  6    3        4       12
  7    3        5       28
+1

Source: https://habr.com/ru/post/1665988/


All Articles