After searching several forums on similar issues, it seems like one way to quickly iterate through a conditional statement is to use the Numpy function np.where()on Pandas. I am having problems with the following task:
I have a dataset that looks like several rows:
PatientID Date1 Date2 ICD
1234 12/14/10 12/12/10 313.2, 414.2, 228.1
3213 8/2/10 9/5/12 232.1, 221.0
I am trying to create a conditional statement so that:
1. if strings '313.2' or '414.2' exist in df['ICD'] return 1
2. if strings '313.2' or '414.2' exist in df['ICD'] and Date1>Date2 return 2
3. Else return 0
Given that Date1both Date2are in a date and time format, and my data frame is encoded as df, I have the following code:
df['NewColumn'] = np.where(df.ICD.str.contains('313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains('313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0)
However, this code only returns a string with 1 and 0 and does not include 2. How else can I perform this task?
source
share