I have a dataframe that looks like this:
customer_id event_date data
1 2012-10-18 0
1 2012-10-12 0
1 2015-10-12 0
2 2012-09-02 0
2 2013-09-12 1
3 2010-10-21 0
3 2013-11-08 0
3 2013-12-07 1
3 2015-09-12 1
I want to add additional columns, such as 'flag_1' and 'flag_2' below, which allow me (and others when I pass the changed data) to easily filter.
Flag_1 is an indicator of the first appearance of this client in the data set. I successfully implemented this by sorting:
dta.sort_values(['customer_id','event_date'])
and then using:dta.duplicated(['customer_id']).astype(int)
Flag_2 will be an indicator of the first fall of each client when the column data = 1.
An example of how additional columns will be implemented:
customer_id event_date data flag_1 flag_2
1 2012-10-18 0 1 0
1 2012-10-12 0 0 0
1 2015-10-12 0 0 0
2 2012-09-02 0 1 0
2 2013-09-12 1 0 1
3 2010-10-21 0 1 0
3 2013-11-08 0 0 0
3 2013-12-07 1 0 1
3 2015-09-12 1 0 0
pandas , "flag_2" . , , - ?