Pandas: finding first occurrences of events in df based on column values and marking as new column values

Question

Pandas: finding first occurrences of events in df based on column values and marking as new column values

I have a dataframe that looks like this:

customer_id event_date data 
1           2012-10-18    0      
1           2012-10-12    0      
1           2015-10-12    0      
2           2012-09-02    0      
2           2013-09-12    1      
3           2010-10-21    0      
3           2013-11-08    0      
3           2013-12-07    1     
3           2015-09-12    1

I want to add additional columns, such as 'flag_1' and 'flag_2' below, which allow me (and others when I pass the changed data) to easily filter.

Flag_1 is an indicator of the first appearance of this client in the data set. I successfully implemented this by sorting: dta.sort_values(['customer_id','event_date']) and then using:dta.duplicated(['customer_id']).astype(int)

Flag_2 will be an indicator of the first fall of each client when the column data = 1.

An example of how additional columns will be implemented:

customer_id event_date data flag_1 flag_2
1           2012-10-18    0      1      0
1           2012-10-12    0      0      0
1           2015-10-12    0      0      0
2           2012-09-02    0      1      0
2           2013-09-12    1      0      1
3           2010-10-21    0      1      0
3           2013-11-08    0      0      0
3           2013-12-07    1      0      1
3           2015-09-12    1      0      0

pandas , "flag_2" . , , - ?

+4

python pandas

user 18 . '16 14:57

1

Alexander · Accepted Answer · 2016-02-18T15:09:42+0000

. groupby, customer_id. loc, flag1 . flag2, , data .

# Initialize empty flags
df['flag1'] = 0
df['flag2'] = 0

# Set flag1
groups = df.groupby('customer_id').groups
df.loc[[values[0] for values in groups.values()], 'flag1'] = 1

# Set flag2
groups2 = df.loc[df.data == 1, :].groupby('customer_id').groups
df.loc[[values[0] for values in groups2.values()], 'flag2'] = 1

>>> df
   customer_id  event_date  data  flag1  flag2
0            1  2012-10-18     0      1      0
1            1  2012-10-12     0      0      0
2            1  2015-10-12     0      0      0
3            2  2012-09-02     0      1      0
4            2  2013-09-12     1      0      1
5            3  2010-10-21     0      1      0
6            3  2013-11-08     0      0      0
7            3  2013-12-07     1      0      1
8            3  2015-09-12     1      0      0

Pandas: finding first occurrences of events in df based on column values ​​and marking as new column values

More articles:

Pandas: finding first occurrences of events in df based on column values and marking as new column values