Pandas data file manipulation

Question

Pandas data file manipulation

I am trying to perform a specific operation on a data frame. Given the following data block:

df1 = pd.DataFrame({
'id': [0, 1, 2, 1, 3, 0],
'letter': ['a','b','c','b','b','a'],
'status':[0,1,0,0,0,1]})

id letter  status
0   a     0
1   b     1
2   c     0
1   b     0
3   b     0
0   a     1

I would like to create another data framework that contains rows from df1 based on the following restriction.
If two or more lines have the same identifier and letter, return any line with status 1. All other lines should be copied.
The resulting data file should look like this:

id letter  status
 0      a       1
 1      b       1
 2      c       0
 3      b       0

Any help is appreciated. thank you

+4

python pandas

Zihs Apr 2 '14 at 21:34

source share

2 answers

first select statusand then usegroupby

In [1932]: df.sort_values(by='status').groupby('id', as_index=False).last()
Out[1932]:
   id letter  status
0   0      a       1
1   1      b       1
2   2      c       0
3   3      b       0

0

Zero Oct 14 '17 at 14:21

source share

behzad.nouri · Accepted Answer · 2014-04-02T21:41:08+0000

this should work:

>>> fn = lambda obj: obj[obj.status == 1] if any(obj.status == 1) else obj
>>> df.groupby(['id', 'letter'], as_index=False).apply(fn)
   id letter  status
5   0      a       1
1   1      b       1
2   2      c       0
4   3      b       0

[4 rows x 3 columns]

Pandas data file manipulation

More articles: