I asked a similar question here , but I want to expand this question because I am asked to do something a little different where I cannot use .duplicates ()
I have df grouped by key. I want to mark any line within the group where the release date matches the tolerance date. And between these lines, the line with the release date has a value of num1 in the range of 5-12.
df = pd.DataFrame({'Key': ['10003', '10003', '10003', '10003', '10003','10003','10034', '10034'],
'Num1': [12,13,13,13,12,13,15,12],
'Num2': [121,122,122,124,125,126,127,128],
'admit': [20120506, 20120508, 20121010,20121010,20121010,20121110,20120520,20120520], 'discharge': [20120508, 20120510, 20121012,20121016,20121023,20121111,20120520,20120520]})
df['admit'] = pd.to_datetime(df['admit'], format='%Y%m%d')
df['discharge'] = pd.to_datetime(df['discharge'], format='%Y%m%d')
initial df
Key Num1 Num2 admit discharge
0 10003 12 121 2012-05-06 2012-05-08
1 10003 13 122 2012-05-08 2012-05-10
2 10003 13 122 2012-10-10 2012-10-12
3 10003 13 124 2012-10-10 2012-10-16
4 10003 12 125 2012-10-10 2012-10-23
5 10003 13 126 2012-11-10 2012-11-11
6 10034 15 127 2012-05-20 2012-05-20
7 10034 12 128 2012-05-20 2012-05-20
final df
Key Num1 Num2 admit discharge flag
0 10003 12 121 2012-05-06 2012-05-08 1
1 10003 13 122 2012-05-08 2012-05-10 1
2 10003 13 122 2012-10-10 2012-10-12 0
3 10003 13 124 2012-10-10 2012-10-16 0
4 10003 12 125 2012-10-10 2012-10-23 0
5 10003 13 126 2012-11-10 2012-11-11 0
6 10034 15 127 2012-05-20 2012-05-20 1
7 10034 12 128 2012-05-20 2012-05-20 1
filter(), , () . , , , , , , , Num1 5-12,
num1_range = [5,6,7,8,9,10,11,12]
df.loc[df.groupby(['Key']).filter(lambda x : (x['admit'] == x['discharge'].any())&(x['Num1'].isin(num1_range).any())),'flag']=1
ValueError: cannot set a Timestamp with a non-timestamp