How to use .loc with groupby and two conditions in pandas

I asked a similar question here , but I want to expand this question because I am asked to do something a little different where I cannot use .duplicates ()

I have df grouped by key. I want to mark any line within the group where the release date matches the tolerance date. And between these lines, the line with the release date has a value of num1 in the range of 5-12.

df =  pd.DataFrame({'Key': ['10003', '10003', '10003', '10003', '10003','10003','10034', '10034'], 
   'Num1': [12,13,13,13,12,13,15,12],
   'Num2': [121,122,122,124,125,126,127,128],
  'admit': [20120506, 20120508, 20121010,20121010,20121010,20121110,20120520,20120520],  'discharge': [20120508, 20120510, 20121012,20121016,20121023,20121111,20120520,20120520]})
df['admit'] = pd.to_datetime(df['admit'], format='%Y%m%d')
df['discharge'] = pd.to_datetime(df['discharge'], format='%Y%m%d')

initial df

    Key     Num1    Num2    admit       discharge
0   10003   12      121     2012-05-06  2012-05-08
1   10003   13      122     2012-05-08  2012-05-10
2   10003   13      122     2012-10-10  2012-10-12
3   10003   13      124     2012-10-10  2012-10-16
4   10003   12      125     2012-10-10  2012-10-23
5   10003   13      126     2012-11-10  2012-11-11
6   10034   15      127     2012-05-20  2012-05-20
7   10034   12      128     2012-05-20  2012-05-20

final df

    Key     Num1    Num2    admit       discharge   flag
0   10003   12      121     2012-05-06  2012-05-08  1
1   10003   13      122     2012-05-08  2012-05-10  1
2   10003   13      122     2012-10-10  2012-10-12  0
3   10003   13      124     2012-10-10  2012-10-16  0
4   10003   12      125     2012-10-10  2012-10-23  0
5   10003   13      126     2012-11-10  2012-11-11  0
6   10034   15      127     2012-05-20  2012-05-20  1
7   10034   12      128     2012-05-20  2012-05-20  1

filter(), , () . , , , , , , , Num1 5-12,

num1_range = [5,6,7,8,9,10,11,12]
df.loc[df.groupby(['Key']).filter(lambda x : (x['admit'] == x['discharge'].any())&(x['Num1'].isin(num1_range).any())),'flag']=1

ValueError: cannot set a Timestamp with a non-timestamp
+4
4

, , flag = True:

  • (Key).
  • , Num1 5 12 .

.

d1 = df.groupby('Key')['admit'].apply(set).to_dict()
d2 = df.groupby('Key')['discharge'].apply(set).to_dict()

def flagger(row):
    match1, match2 = row['discharge'] in d1[row['Key']], row['admit'] in d2[row['Key']]
    return match2 or (match1 and (row['Num1'] in range(5, 13)))

df['flag'] = df.apply(flagger, axis=1).astype(int)

     Key  Num1  Num2      admit  discharge  flag
0  10003    12   121 2012-05-06 2012-05-08     1
1  10003    13   122 2012-05-08 2012-05-10     1
2  10003    13   122 2012-10-10 2012-10-12     0
3  10003    13   124 2012-10-10 2012-10-16     0
4  10003    12   125 2012-10-10 2012-10-23     0
5  10003    13   126 2012-11-10 2012-11-11     0
6  10034    15   127 2012-05-20 2012-05-20     1
7  10034    12   128 2012-05-20 2012-05-20     1

  • 2 . → → .
  • 2 , , pd.DataFrame.apply.
+1

.

conditions = "(x['discharge'].isin(x['admit'])) & (x['Num1'] >= 5) & (x['Num1'] <= 12)"

conditions , . , key , , . , Num1 discharge 5 12. groupby conditions

filter = df.groupby('Key').apply(lambda x: pd.eval(conditions))
filter.index = filter.index.droplevel(0)

filter

0     True
1    False
2    False
3    False
4    False
5    False
6    False
7     True
dtype: bool

filter , conditions true. - admit times equlivant dischagre , , admit.

dex = df.merge(df[filter.values],left_on=['Key','admit'],right_on=['Key','discharge'],how='left').dropna().index

, flags, True

df['flag'] = (filter | df.index.isin(dex)).astype(int)

:

conditions = "(x['discharge'].isin(x['admit'])) & (x['Num1'] >= 5) & (x['Num1'] <= 12)"
filter = df.groupby('Key').apply(lambda x: pd.eval(conditions))
filter.index = filter.index.droplevel(0)
dex = df.merge(df[filter.values],left_on=['Key','admit'],right_on=['Key','discharge'],how='left').dropna().index
df['flag'] = (filter | df.index.isin(dex)).astype(int)

:

     Key  Num1  Num2      admit  discharge  flag
0  10003    12   121 2012-05-06 2012-05-08     1
1  10003    13   122 2012-05-08 2012-05-10     1
2  10003    13   122 2012-10-10 2012-10-12     0
3  10003    13   124 2012-10-10 2012-10-16     0
4  10003    12   125 2012-10-10 2012-10-23     0
5  10003    13   126 2012-11-10 2012-11-11     0
6  10034    15   127 2012-05-20 2012-05-20     1
7  10034    12   128 2012-05-20 2012-05-20     1
+1

, :

num1_range = [5,6,7,8,9,10,11,12]

def get_flags(group):
    flagged_discharge_dates=group.loc[group['Num1'].isin(num1_range),'discharge']
    flag=group['admit'].isin(flagged_discharge_dates)
    flag=flag.astype(int)
    return flag

df['flag']=df.groupby('Key',group_keys=False).apply(get_flags)
df

    Key Num1    Num2    admit   discharge   flag
0   10003   12  121 2012-05-06  2012-05-08  0
1   10003   13  122 2012-05-08  2012-05-10  1
2   10003   13  122 2012-10-10  2012-10-12  0
3   10003   13  124 2012-10-10  2012-10-16  0
4   10003   12  125 2012-10-10  2012-10-23  0
5   10003   13  126 2012-11-10  2012-11-11  0
6   10034   15  127 2012-05-20  2012-05-20  1
7   10034   12  128 2012-05-20  2012-05-20  1

, , ( )

0

Edit: - groupby, .loc

, "1", , , Num1 5 12 ()

, .

df.loc[(df['admit'] == df['discharge'] ) & (df['Num1'].isin(num1_range)), 'flag'] = 1
df.loc[~((df['admit'] == df['discharge'] ) & (df['Num1'].isin(num1_range))), 'flag'] = 0
print(df)

:

     Key  Num1  Num2      admit  discharge  flag
0  10003    12   121 2012-05-06 2012-05-08   0.0
1  10003    13   122 2012-05-08 2012-05-10   0.0
2  10003    13   122 2012-10-10 2012-10-12   0.0
3  10003    13   124 2012-10-10 2012-10-16   0.0
4  10003    12   125 2012-10-10 2012-10-23   0.0
5  10003    13   126 2012-11-10 2012-11-11   0.0
6  10034    15   127 2012-05-20 2012-05-20   0.0
7  10034    12   128 2012-05-20 2012-05-20   1.0

You can see that only the last line satisfies the condition and has a flag set to "1".
Hope this helps.

0
source

Source: https://habr.com/ru/post/1694625/


All Articles