Pandas data file manipulation

I am trying to perform a specific operation on a data frame. Given the following data block:

df1 = pd.DataFrame({
'id': [0, 1, 2, 1, 3, 0],
'letter': ['a','b','c','b','b','a'],
'status':[0,1,0,0,0,1]})

id letter  status
0   a     0
1   b     1
2   c     0
1   b     0
3   b     0
0   a     1

I would like to create another data framework that contains rows from df1 based on the following restriction.
If two or more lines have the same identifier and letter, return any line with status 1. All other lines should be copied.
The resulting data file should look like this:

id letter  status
 0      a       1
 1      b       1
 2      c       0
 3      b       0

Any help is appreciated. thank you

+4
source share
2 answers

this should work:

>>> fn = lambda obj: obj[obj.status == 1] if any(obj.status == 1) else obj
>>> df.groupby(['id', 'letter'], as_index=False).apply(fn)
   id letter  status
5   0      a       1
1   1      b       1
2   2      c       0
4   3      b       0

[4 rows x 3 columns]
+5
source

first select statusand then usegroupby

In [1932]: df.sort_values(by='status').groupby('id', as_index=False).last()
Out[1932]:
   id letter  status
0   0      a       1
1   1      b       1
2   2      c       0
3   3      b       0
0
source

Source: https://habr.com/ru/post/1534725/


All Articles